We’ve made several changes to our log rotation setup in this release. Here is a summary of the most important changes.
Log rotation for SFTP
We have noticed several Hypernodes gradually filling up their disk because the SFTP logs grew out of proportions. We prevent this by enabling log rotation on the SFTP logs.
When the SFTP logs grow out of proportion faster than our usual log rotation can keep up with, we perform an emergency log rotation on the SFTP logs. This means that the log file will be automatically truncated to a reasonable size when our systems detect that it’s growing too fast.
Removed outdated FTP vulnerability mitigation
Back in 2015 the vulnerability CVE-2015-3306 was found in ProFTPD version 1.3.5. To mitigate this vulnerability, we disabled the mod_copy
module back then. Nowadays we are using version 1.3.5a, which is no longer vulnerable. Since we use this new version, we removed the vulnerability mitigation, which makes the mod_copy
module available for use once again.
Look here for the full changelog of version 1.3.5a.
Fixed Nginx emergency log rotation when disk is 100% full
Previously we used sed
to create some disk space to truncate a large Nginx log file. However, in the case that the disk is filled to the brim, sed
doesn’t work anymore. This is because there is no space left on the disk to allocate to the sed
buffer.
This is fixed by replacing this tool with /usr/bin/truncate
, as this doesn’t require the allocation of any additional memory before truncating the log file.
Emails if unusual logging patterns detected
Our systems will now send an email when your Hypernode is showing irregular logging patterns. This means that our systems had to perform an emergency log rotation on one of your log files. Since this usually indicates a problem in your webshop, we will inform you with the details regarding the log file that is showing these irregular patterns.
Emails for Bing bot crawling footprint
When our systems detect that the performance of your Hypernode is degraded because of an abusive Bing crawler, we will send you an email containing information to prevent this from happening again. Since we will not block this crawler automatically (because this will impact your SEO), we will redirect you to the Bing Webmaster Tools, where you can configure the behaviour of this crawler on your Hypernode.
Improved memory management
We noticed some flaws in our out-of-memory (OOM) monitor, where some of the app
user’s processes would not count towards the memory limit set in our oom monitor. This was because the memory of the app
user’s processes would be placed in the system.slice
, instead of the slice that our OOM monitor listens to. We now once again actively place child processes of php-fpm
in this separate slice so that our OOM monitor can take appropriate actions when a process of the app
user allocates too much memory.
This also means that when your Hypernode runs out of memory, we will now send an email including the details of our OOM monitor’s activities on your Hypernode.
Changes behind the scenes
To keep our automated processes behind the scenes running as smooth as possible, we made some significant changes that allows us to keep running with as little failures as possible. Here is a list of some significant changes:
Earlier alerting for failed up- or downgrades
Our up -and downgrade process is fully automated. While we would rather have no up- or downgrade fail at all, it’s a possibility that can happen when there is a series of unfortunate events.
In order to notify our team as soon as possible, we now alert our team whenever we notice something is going wrong during an up- or downgrade. This will improve our response time during these events and will hopefully reduce the time needed to fix the issue.
Automated stuck job removal
Occasionally a job gets stuck on our automated systems which can happen because of several reasons (broken pipes, network, we do a lot of deploys, cloud API’s, etc). Instead of manually removing these jobs, we will now kill and remove these jobs automatically. This will increase job to job throughput in our systems, making our systems to have fewer delays.
Increased the amount of upgrades and downgrades we can process simultaneously
Previously we had a limit of running six up- or downgrade jobs simultaneously to reduce the load on our systems. Since we have improved the performance of our systems nowadays, we removed this limit to once again improve the job throughput of our systems.
Profiling memory usage of our backend processes
We added a memory profiling tool to our systems to get more insight in the memory usage of the running processes. We hope to use that to do some optimizations to increase job throughput in the future.