On Hypernode we have a very sophisticated system for dealing with low memory situations. As described in these earlier changelog articles this system has evolved quite a bit over time and we are constantly looking for tweaks and adjustments so that we can strike the best possible balance between stability, performance and flexibility. Over time we have noticed that the proportions of memory utilization between various system components in a Magento store are always in flux and not all low memory events are created equal.
A while back we implemented some changes that enabled our systems to preventively slay non-essential processes owned by the app
user in situations where the availability of the web service would be otherwise impacted. After monitoring how this change impacted the work-flow of various users we will now slightly increase the threshold on which this mechanism acts to allow some more leeway with user-initiated processes.
Previously when a kernel out-of-memory kill was detected in the cgroup that accommodates the non-root space we would start terminating user processes to safeguard known essential processes as soon as we’d count two occurrences within one second. This mechanism supplements the built-in kernel out-of-memory killer which is notoriously unpredictable due to its ‘last resort’ nature. By stepping in early we can ensure that the processes that are slain are chosen so in a controlled manner.
In this release we will increase that limit from two times per second to five times per second in order to more clearly distinguish between situations where the memory boundary is overstepped incidentally versus when the out of memory condition is a manifestation of a more structural issue. This means that when a user or periodic process causes an out of memory kill to be triggered this won’t lead our systems to take control and take precautionary measures as quickly anymore.
Keep in mind that out of memory is still out of memory so if you see that dreaded Killed
output in your terminal you should take a good look the /var/log/kern.log
to pinpoint the cause if you’re worried that there might be an ongoing problem.
Other changes:
– We have decreased the cooldown on our out of memory mail notification from one week to two weeks
– Setting the override_sendmail_return_path using the hypernode-api will now trigger an automated update like all other settings already do
These changes will be deployed over the course of the coming week.