Making sure you can handle bad bots

A few of our customers experienced a scenario where all available PHP worker slots were full. Mostly caused by errant bots, this scenario typically results in a slow site in mild cases, or timeouts for clients in heavy cases.

We want to empower you and make sure you can tune your Hypernode yourself! But this problem can be hard to track. So three weeks ago, we’ve started efforts to recognize this happening and to offer investigatory information to you as a customer.

Today, we’re rolling out a first version. This first version will give Hypernode Support insight into what is happening on the platform. Because the solution is rough around the edges and experimental in nature, we will send all notifications to Hypernode Support first. When the process is more polished, we’ll give you the power to tune your Hypernode by telling you what’s up.

FPM monitoring

  • When all FPM slots are full, FPM is not able to tell us what it is working on.
  • By adding a small snippet of prepend code to PHP, we’ve gained insight in what all workers are working on.
  • This information is gathered when we detect that FPM is no longer responding.
  • Also, we gather other information that might indicate the problem, such as MySQL processlists, Nginx access logs, etc.
  • We trigger a mail (we mail it to the Hypernode devs for now) containing 1) notification of the problem itself, 2) a link to the gathered information.

 Other fixes

  • Reconfigured outgoing mail, to further reduce the chances of mail getting classified as spam.
  • Enabled PHP slow logs for app user (/var/log/php5-fpm/php5-slow.log).
  • Avoid “fork bombs” by limiting the number of possible spawned processes per shell.
  • Tweak MySQL indexing to perform better with Magento (use_index_extensions=off)