Deploying HTML5 apps on CloudFront with efficient invalidation requests

So you decided to build your next web app/site using nothing but HTML5 and Javascript.
No server side processing for anything related to UI.

This means you will be coding a lot of JavaScript.

Wouldn’t it be nice to put all that static HTML and JS on your CloudFront CDN and not deal with web servers?

Once you deploy it’s nice, but Amazon Cloudfront is not meant for you to be invalidating objects all the time, they actually charge you $0.005 per each invalidated file.

They tell you to version your files, but if you’re going for a no webserver based deployment, 100% cloudfront deployment, you can’t play around with url rewrites or redirects, so you’ll want to invalidate the files that have changed.

Here’s a script to invalidate only the files that have changed.

Our configuration
Cloudfront pointed to a custom origin.

The custom origin web server Document Root points to
/var/www/mysite/production

‘production’ is a symlink pointing to a folder inside /var/www/mysite, in there we have copies of the entire website saved under version numbers, for example

[bash]/var/www/mysite/1.0.1
/var/www/mysite/1.0.2
/var/www/mysite/1.0.3 -> production/
/var/www/mysite/1.0.4[/bash]

What our script does, is it uses rsync to compare the files that changed between the current production directory (in this case 1.0.3 with the next version 1.0.4), and dump the new files on a temporary folder /var/www/mysite/tmp

We then create a list with all the files in the /tmp folder, and for each 1,000 files that have changed we send Cloudfront an invalidation request and move the ‘production’ symlink to the new version.

Instead of invalidating every file (which would be costly), we just invalidate what changed, and we don’t have to change any of our code to reflect version numbers.

All of this is handled by command line tools that let us do this in one step (one step deployment)

Here’s some of the python code we use to get the list of files that changed in order to build the invalidation request with the minimum files necessary so Jeff Bezos doesn’t take all our money.

[python]
def getFilesToInvalidate(config, newVersion):
”’Compares the files in the new version folder with what’s currently in production.
returns a list of the files that changed or got removed.”’
checkKey(config,’website.exports.path’)

oldFiles = config[‘website.exports.path’] + os.path.sep + ‘production’
diffFiles = config[‘website.exports.path’] + os.path.sep + ‘diff’

#the ../NEW_VERSION is important, the command needs relative paths to work
#rsync -av –compare-dest=../NEW_VERSION /path/to/OLD_VERSION /path/to/TEMP_CHANGED
rmDiffCmd = ‘rm -fr ‘ + diffFiles
cmd = ‘rsync -av –compare-dest=../%s %s/ %s’ % (newVersion,
oldFiles,
diffFiles)
print cmd
os.system(rmDiffCmd)
os.system(cmd)
diffList = getFilesInPath(diffFiles)
result = []
for i in xrange(len(diffList)):
if diffList[i].find(‘#’)!=-1:
continue
result.append(diffList[i].replace(diffFiles,”))
os.system(rmDiffCmd)

return result
[/python]

Then use the ‘result’ list to create your CloudFront invalidation request.

Screw server side processing to render web pages, clients can do it now with client-side templating and javascript.
And oh Amazon, you guys need to step up your CloudFront Console features, it’d be awesome to invalidate paths and what not, it’s ultimately about what the customer needs.

Here’s toast to a faster cloud, cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *