This is the second post in a series of posts about the rebuilding of my Howick Weather Station website.
In my previous job, in which I was primarily a front end developer, I started dabbling in back end technologies and getting familiarised with AWS. In particular, our team were building a serverless application using Lambda functions and API gateway for the backend API solution, and then hosting our front end solution using S3 static web hosting.
I found the whole experience with AWS really cool, and my initial plan for the weather site rewrite was something similar to the above - an API using NodeJS lambda functions and a single page application hosted on S3. So I spent several weeks investigating - creating DynamoDB tables, creating the scripts which automate data upsertion into Dynamo from my weather PC, writing Lambda functions which query Dynamo, and testing out performance. Ultimately I came away from these few exploratory weeks having decided not to go with AWS for this particular project, and I’ll attempt to explain why below.
The main reason was because of my troubles with fitting my data and how it is queried into DynamoDB. Straight away Dynamo is not as well suited to time series data as other databases - yes there was great documentation and examples provided by AWS themselves about how to store time series data, but it doesn’t solve every scenario. Based on my own experimentation I concluded that my data and the way I want to query it meant that Dynamo was not an ideal solution for me.
For example, let’s consider my observations data, which contains rows every 5 minutes on a number of weather parameters. With MySQL, this has a time column as the primary key (unix timestamp) and a secondary timestamp column which stores a local timestamp in the form Y-m-d H:i (e.g. 2018-12-07 15:45). How does this fit into a Dynamo table? For the partition key, choosing the unique time column as the partition key doesn’t help because my queries select a range of data between two timestamps. So the time column would be better as the sort key. But then what would the partition key be? I thought about using the date as the partition key. This would solve selecting all the data for one day in particular, but still there are issues with selecting a custom data range which extends across multiple days. For example in some parts of my site I want to load observations for the last 7 days and graph them. Or for a full calendar month even, but only selecting observations at every exact hour point (e.g. 15:00, 16:00, 17:00, …). All things that are super simple with a MySQL database.
On top of that, I also have pages where the API returns statistics calculated on the fly. This is easy with current setup using MySQL, but becomes much harder with Dynamo. I don’t doubt that I could have found some way to get all my data and API routes working using Dynamo, but for me the simplicity of a simple fast MySQL database won out. Who knows how much time the Dynamo solution would have me or how much stress and frustration it would have inflicted ;)
Also, while testing performance, I found that real-world in-browser request times using a Lambda + Dynamo API solution were consistently higher compared to my current setup, which is locally hosted within the same city that I live in. There is not only higher latency - the closest AWS region is Sydney Australia - but due to the way lambda functions work, the cold starts add extra time onto API calls as well. And yes I know you can keep your lambda functions warm by pinging them at regular intervals, but I wasn’t really a great fan of this approach, and still, even warm lambda functions were not always super fast. DynamoDB query times seemed to be variable as well. This performance decrease would be a huge negative for me - I’m considering this whole website rebuild an upgrade to my existing solution, and having a drop in performance sort of goes against that idea!
So having decided MySQL is the way to go, how about investigating AWS services which provide this for me? I did not spend much time here as the cost of running services on AWS which provide me with a MySQL database are too high. For example with RDS, running a t2.micro in Sydney is priced as 0.026 USD/hr, i.e. 227 USD/year. I only need a couple GB maximum for my database, but RDS offers a minimum SSD storage of 20GB, so I would be paying for a whole lot of space I will never use. Sticking with my local web hosting would not only be faster, but cheaper as well.
Therefore I decided to stick with my local hosting on a PHP/MySQL server, which continues to deliver blazing fast performance. The vast majority of my API calls return to the browser within 150ms, with many of these calls below 100ms. With AWS, I noticed requests that queried Dynamo taking up to 10 times longer in some cases!
I still think AWS is a great platform for building and deploying web applications but it’s not always the best approach, especially for small-scale projects. You have to consider the size and location of your user base too. For a relatively simple application where almost all the users are in the same geographic region as you, it may make more sense to locally host to get that low latency (unless you live near an AWS data center). The vast majority of my website visitors are located in the same city as me, so hosting within my city guarantees most of my visitors a highly performant and responsive experience.