slefain
PowerDork
8/1/20 11:55 a.m.
I'm trying to help a friend with his site. Fresh WordPress build, runs perfectly on the test environment. But put it on the production server (with about 100 simultaneous web users) and it work great for about an hour. After an hour or so the open database connections start creeping upwards, response times starts to climb, and the site grinds to a halt. No slow queries reported. No heavy DB server load. Just a growing mounting of database connections that smother it.
Caching is enabled, plenty of drive space on AWS, RAM usage is fine. We smoke tested it on the test server for days, no issues. Put it live to the public and it will run like a cheetah until it just collapses. Damnedest thing I've seen.
I don't know WP specifically, but it sounds as if it's not cleaning up after itself. I'd dial down the connection wait timeout. Default is a ridiculous number like 6 hours. Seems to me you could probably get away with something more like 5 minutes unless WP requires a constantly open connection.
slefain
PowerDork
8/1/20 12:58 p.m.
Keith Tanner said:
I don't know WP specifically, but it sounds as if it's not cleaning up after itself. I'd dial down the connection wait timeout. Default is a ridiculous number like 6 hours. Seems to me you could probably get away with something more like 5 minutes unless WP requires a constantly open connection.
Thanks, we'll add that to the list. We're down to throwing poop against the wall at this point and hoping something sticks.
Same operating system version, package versions and same kernel parameters?
Also where is the MySQL dB running? On the same host as WP or on RDS? What size/instance type EC2 instance?
All excellent questions I will try to find out Trumant. My friend sent me a screenshot of the server dashboard when it croaked:
While you are checking settings you should also check if you are configured to run MySQL connection pooling or not on the WP/PHP side of things and if so what your connection pool limit is set to.
I'd be as confused as you are but Keith's idea of decreasing wait_timeout is a good one. Even 5 minutes is plenty, I'd think the optimal setting would be just above your PHP execution time limit, although that could break any query browser tools that use one long-running connection. Of course that's just treating the symptom of something not cleaning up after itself, but it could work.
Do you have an idea of timeouts or percentage of error responses? Does the increased db load coincide with increased server load?
Changing the timeout to 15 seconds didn't work, connections still spiked.
He's going to rebuild the server again, but change server types.
I know jack-all about modern web servers, but my Google Fu is strong so I've been trying to help my friend noodle this one out. I'll report back when the new server build is tested.