Apache SSIs over AWS EFS



The Apache SSI performance problem

Apache over AWS EFS is not particularly bad when you have a CDN at the front, but when you start using Server Side Includes (SSIs) then performance can really degrade regardless of CDN optimisations. This is because of the way SSIs are implemented internally in Apache which require constant directory lookups of the file system, of which both NFS (and AWS EFS) are renowned to be poor at because of a recently introduced bug involving the NFS "readdirplus" option.

In some ways, the solution to this problem is like going back to a time when disk spindles were slow.

NOTE: By default AWS EFS operates at the entry level tier which has a very low bandwidth and burst credit limit. You need to raise the tier to a more practical level for it to be useful which also corresponds to a significant increase in monthly charges - this is the price you pay for performant HA managed by someone else.



Linux file cache

On a Linux server, many Apache features for disk I/O optimisations have taken a back seat as they cannot compete with the efficiency of the Linux kernel's in-built disk cache algorithms (e.g. hash buckets). This means that if you are running an Apache server on Linux over a local disk then you don't need to concern yourself with disk cache as the Linux kernel is already doing a sterling job for you.

However, if you are running an Apache server over NFS (or AWS EFS) then kernel caching of the network share does not appear to be at all efficient. The only caching tools you have at your disposal are those provided by NFS which are quite ordinary and often prove inadequate. Therefore you have to devise a replacement.



Thin cache

What we want to create is something much like the Linux disk cache, but we don't want files to live in cache for long periods of time.

Apache has an old module called mod_cache that can help us. This module has two sub-modules: mod_mem_cache and mod_disk_cache, however mod_mem_cache is now no longer provided in modern Linux releases - perhaps its relevance was lost over time.

But this is not a problem for we can easily run mod_disk_cache over a memory disk partition like ramfs or tmpfs. Provided you have the available memory, only cache small objects, only cache for very small periods of time, and perform routine cache maintenance then this will work very well as a thin cache.



Memory disk

On Linux we have a choice of using ramfs or tmpfs as a memory disk partition. This just manifests itself as an entry in /etc/fstab with a mount point, meaning they are very easy to set up.

The ramfs solution would have been ideal as it is only constrained by the physical memory on your computer, which lets you control its growth entirely from within the Apache configuration. This is ideal for dedicated web hosts. Unfortunately the mount(1) man page tells us that ramfs does not accept any mount options (not strictly true in my observations) which isn't helpful in securing the memory disk exclusively to Apache access.

On the other hand, a tmpfs memory disk allows us to specify many mount options, and appears to be the more flexible of the two memory disk options. It's only drawback is that we have to size it to be the same between Linux and Apache, but that is no big deal.

Create the mount point:

mkdir /mnt/cache_fs

Create a tmpfs entry in /etc/fstab that is 1Gb in size:
Note: uid 48 and gid 48 are the Apache user.

tmpfs /mnt/cache_fs tmpfs uid=48,gid=48,mode=0770,nodev,nosuid,noexec,noatime,size=1024m      0 0

Mount the memory disk:

mount /mnt/cache_fs
df -h /mnt/cache_fs
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           1.0G     0  1.0G   0% /mnt/cache_fs

That was pretty easy, eh!

Things get slightly more challenging with SELinux and you will need to also set an fcontext for /mnt/cache_fs(/.*)? of type httpd_cache_t.

Apache disk cache

Because we are serving web content from an NFS share, and because we are heavy with the use of Apache SSIs then it can help improve the overall performance of our web server if we use an Apache disk cache.

Load modules mod_cache.so and mod_disk_cache.so in your Apache configuration then create a stanza similar to this:

LoadModule cache_module modules/mod_cache.so
LoadModule disk_cache_module modules/mod_disk_cache.so

<IfModule mod_disk_cache.c>
   CacheEnable disk /
   CacheRoot "/mnt/cache_fs"
   CacheDirLevels 6

   CacheDefaultExpire 15
   CacheMaxExpire 300
   CacheIgnoreNoLastMod on
   CacheIgnoreHeaders Set-Cookie

   CacheMaxFileSize 400000
   CacheMinFileSize 1

   # This is not really needed if your CDN is intelligent enough to prevent "thundering herd" requests.
   CacheLock on
   CacheLockPath "/mnt/cache_fs/mod_cache-lock"
   CacheLockMaxAge 5
</IfModule>

Restart Apache and you are now in business.

In the above configuration I'm simply caching objects that meet the following criteria:

I'm also using basic cache locking.

Remember, our CDN is doing the actual heavy caching work and does it more intelligently than this configuration, so please keep it in context that this is thin cache and only exists to smooth out NFS performance issues, especially when using Apache SSIs over AWS EFS.

NOTE: Because the memory disk file system is writeable by the actual Apache user then the cache security is vulnerable to exploits from poorly written CGI scripts or poorly configured Apache servers. Refer to the Apache documentation for further details.



Apache disk cache maintenance

The Apache mod_disk_cache module isn't smart enough to manage cache cleaning duties so Apache supply a disk cache maintenance tool called htcacheclean that can be run either as a daemon or via cron. I prefer to run it via root cron like this.

*/5 * * * * /usr/sbin/htcacheclean -t -p /mnt/cache_fs -l 1024M

There is no benefit in using the -n (nice) option on a memory disk, and using it just adds unnecessary delays to the cache clean up run.

Note: For RedHat 6.9 / Apache 2.2.x I found htcacheclean doesn't actually do anything - great! Therefore I had to convert the above into a script that performs the following cleanup.

#!/bin/sh
/usr/sbin/htcacheclean -t -p /mnt/cache_fs -l 1024M
/usr/bin/find /mnt/cache_fs -not -name mod_cache-lock -depth -mindepth 1 -cmin +10 -delete > /dev/null 2>&1

exit 0


Gotchas

By default the mod_cache cache object is created after all Apache processing has completed. This means if you have for example a .htaccess file in a subdirectory that redirects all URLs to a central controlling script like this:

RewriteRule ^.*$ control.php [L]

Then the request returned by mod_cache will always be the last request to populate the cache which could belong to another users' request, and is probably not the behavior you desire.

If using Apache 2.3.3 or greater you can work around this by using the CacheQuickHandler and AddOutputFilterByType directives, otherwise you either exclude the location from being cached by using the CacheDisable directive, or you supply a dummy query string to your rewrite rule like this:

RewriteRule ^(.*)$ control.php?url=$1 [L]

If you are lucky and already have query strings supplied with your request then you could just simply do this:

RewriteRule ^.*$ control.php [L, QSA]

The goal is the same, all you want to do is make the resulting cache artefact be unique to the incoming request.



arthurguru, 2017.