When it comes to scalability, architecture diagrams can give us a false sense of security.
You may have designed and provisioned a slick cloud infrastructure, used an enterprise container orchestration tool, or adopted a battle-tested distributed streaming platform. Or gone serverless. But none of these guarantees that your application will scale.
No amount of planning or white-boarding can tell you how many concurrent users your application can support. Neither can they tell you how your application will behave under load. Or at what point (or where) it will break.
There are two reasons for this.
Firstly, despite their best intentions, many developers write code that doesn’t scale.
The tricky bugs that keep developers up at night usually involve memory leaks, deadlocks and race conditions. Such bugs aren’t easily identified using ordinary manual (or automated) functional tests. Instead, they crop up in production during periods of extremely high or unexpected traffic.
Secondly, bottlenecks in IT systems often occur in places developers haven’t anticipated. You might assume that your application can be 'scaled out' through the automatic provisioning of extra virtual machines. Or that you can rely on Kubernetes. But this won’t help when the problem is a database that’s run out of memory. Or a clogged network. Or because of WebSocket exhaustion.
Ultimately, the only way to find out how whether an application truly scales is to properly performance test it.
The Cost of Ignoring Performance Testing
In my experience, I've found that many teams don’t carry out any performance testing. There are many reasons for this including not being given enough time, not having the tools or in-house expertise or simply because the team doesn't see the need.
Some teams believe that they can rely on functional testing, reference architectures and the auto-scale features of cloud vendors. But functional testing won’t find the performance-related issues associated with realistic high load. Neither will reference architectures and auto-scaling help because every code base is unique.
This scalability fallacy, that is assuming your application will scale because you've used a ‘scalable architecture’, has some serious commercial implications. Systems that don’t scale to meet user demand can result in significant revenue losses and degradation in customer experience.
For example, system downtime and slow response times of e-commerce websites result in customers abandoning online purchases. In 2013, Amazon’s servers crashed for 30 minutes and it lost an estimated $66,240 per minute.
It’s not just tech giants that need to worry about performance. ChannelAdvisor estimates that online retailers lose about 4% of a day’s sales for each hour of website downtime. A site doesn’t need to go completely down to have a negative impact on user experience either.
It’s well known that slower response times increase the bounce rate of any type of website. The BBC, for instance, noticed that they lost 10 per cent of users for every additional second of page load latency. Part of their solution was to gracefully remove less important features from a page when they detect an increase in page load latency.
‘Black Friday’-type peaks in traffic can be planned for and handled properly. But only if you know where, when and how your system will break.
Without this information, you rely on your customers unintentionally discovering the breaking points. Or worse for malicious users to find them. System crashes can expose valuable system information to an attacker, create temporary vulnerabilities or expose unprotected files or memory dumps.
The solution is to invest in some professional performance testing. This will be cheaper than the cost of abandoned carts, reputation damage, potential security issues, and data corruption that system downtime may cause.