Load Balancers - connections stuck open, climbing daily
We've got two facilities, each handled by Coyote Point load balancers. We've always had a problem with the open connections climbing higher and higher, where eventually performance problems kick in.The connections reported by the load balancers don't match the number on the servers themselves, its like every few seconds a connection or two is left open, over time, the numbers get wildly high (in the thousands).
We're using the LB devices in a single-network scenario, although we've tried the dual network setup, with similar results.
The connections can be brought back to down "real" numbers by setting the weight of a machine to zero for 3-5 minutes. Then we see an appropriate number of connections compared to open connections on the server itself, for awhile, until they start climbing again.
Connections are also brought back down if we completely reset www services on a server, or reboot it.
We've got client and server connection timeouts on both the node servers and the load balancers set pretty low (but we've tried them high as well), we're not using keepalives (which makes things worse), and nothing seems to keep the connections on the LB devices from stacking up over time.
This can be more of a problem if one of the node servers has to be restarted, because the connection count is much lower than the other nodes, and that affects how the load balancing algorithm works, and sends much more traffic to that freshly-recycled server (their Adaptive load balancing algorithm looks at the number of connections each server currently has).
I've considered writing a shell script that will re-write the eq.conf file every few hours, setting each of the server's weight to 0, then restarting the LBD daemon, and then putting the weight back to 100 after a few minutes. I'm sure that would work, but its a kludge.
Coyote Point, over the last couple years, hasn't been able to determine the cause of this, or offer a solution. I still swear by their products, their Extreme II's are heavy-duty machines, and are much less expensive than F5 or ServerIron's stuff. There's just this single problem that is a little annoying, and getting worse as our traffic levels.
Any information or tips would be great.