Hi,

Yes. It looks like sendalot is being set incorrectly in the SACK rexmit
path. The problem is that if there's a lot of data in the SACK scoreboard,
and if cwnd allows re-sending only part of it, sendalot gets set to 1
each time thru the loop and we loop around.

The patch referenced below will cause just one SACK segment to be retransmitted 
in a call to tcp_output(), even if cwnd allows us to retransmit more segments.
This will depress thruput in the face of packet loss.

I think, it might be better to set sendalot only when len > 0 instead. This will 
cause us to retransmit as many SACK segments as we're allowed to send (per cwin) 
in a call to tcp_output() before we send new data.

I'm referencing 6.x here, but this applies to 5.3 as well.

crabapple.corp.yahoo.com> p4 diff -dc tcp_output.c 
==== //depot/mohans/freebsd6_nfs/sys/netinet/tcp_output.c#1 - /homes/mohans/p4/mohans/freebsd6_nfs/sys/netinet/tcp_output.c ====
***************
*** 231,242 ****
                                                   tp->snd_recover - p->rxmit));
                } else
                        len = ((long)ulmin(cwin, p->end - p->rxmit));
-               sack_rxmit = 1;
-               sendalot = 1;
                off = p->rxmit - tp->snd_una;
                KASSERT(off >= 0,("%s: sack block to the left of una : %d",
                    __func__, off));
                if (len > 0) {
                        tcpstat.tcps_sack_rexmits++;
                        tcpstat.tcps_sack_rexmit_bytes +=
                            min(len, tp->t_maxseg);
--- 231,242 ----
                                                   tp->snd_recover - p->rxmit));
                } else
                        len = ((long)ulmin(cwin, p->end - p->rxmit));
                off = p->rxmit - tp->snd_una;
                KASSERT(off >= 0,("%s: sack block to the left of una : %d",
                    __func__, off));
                if (len > 0) {
+                       sack_rxmit = 1;
+                       sendalot = 1;
                        tcpstat.tcps_sack_rexmits++;
                        tcpstat.tcps_sack_rexmit_bytes +=
                            min(len, tp->t_maxseg);

Let me know what you think.

thanks,

mohan

Paul Saab (ps@yahoo-inc.com) wrote:
> 

> From: Peter Losher <Peter_Losher@isc.org>
> Organization: ISC
> To: Robert Watson <rwatson@freebsd.org>
> Subject: Re: [Fwd: Re: 5.3 stability?]
> Date: Wed, 27 Oct 2004 00:35:28 -0700
> Cc: Scott Long <scottl@freebsd.org>,
> 	"George V.Neville-Neil" <gnn@neville-neil.com>, ps@freebsd.org,
> 	re@freebsd.org, dhartmei@freebsd.org
> 
> On Tuesday 26 October 2004 06:13 am, Robert Watson wrote:
> 
> > > Off to sleep... :)  Thanks again for your suggestions and advice.
> >
> > Daniel Hartmeier has created the following patch that tweaks the logic
> > for SACK retransmit to match what it is in NetBSD/OpenBSD, and he
> > believes may prevent a spinning scenario with behavior that strongly
> > resembles what we're seeing.  Assuming that the SACK disabling caused
> > the problem to disappear, it sounds like a good place to begin:
> >
> >     http://www.benzedrine.cx/sendalot.diff
> 
> Look like that did it; the patched kernel (w/ TCP_SACK turned back on) 
> has been running for 11+ hours w/ no problems so far.
> 
> -=-
> % uptime
>  7:32AM  up 11:10, 1 user, load averages: 8.11, 7.51, 6.20
> -=-
> 
> I am heading for bed now to catch up on the sleep I lost last night. :)  
> Talk to you all later this morning. :)
> 
> Best Wishes - Peter
> -- 
> Peter_Losher@isc.org | ISC | OpenPGP Key E8048D08 | "The bits must flow"