Cloud finance or how to (un)manage costs in the cloud

The cloud can be a valuable asset, but it can also become a costly burden. This is especially true in the realm of finance and expenses. The growing number of inquiries from clients struggling with cloud cost management prompted this article. Additionally, we've encountered similar challenges within our organization, as financial resources are always finite.

My cloud finance recommendations are twofold: process recommendations, which are applicable to any environment, and technical recommendations, which focus primarily on Amazon Web Services (I am an AWS Principal Cloud Architect, after all); but many can be adapted to any cloud hyperscaler environment.

Cloud finance on a budget
 

So what should you look at if you’re starting to worry about cloud spend? Be sure to start with the processes.

The following diagram was created by Lukáš KlášterskýZdeněk Mach and their colleagues from the Governance team (who are happy to help you not only in the area of cloud strategies). Its core idea is the budget, i.e. an estimate of the cost of operating the infrastructure over a given period.

In most cases, the budget is defined for a year, but is based on the expected monthly spend. It is created when designing an application, whether using AWS Pricing Calculator or Azure Pricing Calculator.

In most cases, we recommend that customers primarily calculate the “main” components of the application at this stage, or those that will have a significant impact on the overall price. It is then advisable to increase the calculated price by about 10%, which will cover the costs of various sub-services, the exact calculation of which is not possible (or would take too much time).

So, in our example above, we have an annual budget of $60,000 and a projected spend of $5,000 per month. That’s a good start because we have something to bounce back from.

 

Use of budget (consumption)
 

If you know how much you plan to spend each month, you should monitor your spending regularly. If you’re spending more than you planned, you need to increase your budget (and vice versa – if you’re spending less, you can decrease it).

Higher budget consumption 

You probably did not take something into account when analyzing and designing the application, or there was a change at the infrastructure level, an additional server was added, you needed higher performance, etc. This is of course possible and legitimate use, but it is important to know about it.

A nice example is an application that was “barely paid for” when initially proposed but was eventually implemented. During its lifetime, new features have been implemented (cost impact), but the yield (or other benefit) of the application has not changed.

So you may find yourself in a situation where the application is no longer profitable and it’s time to “reprice” or even cancel it. However, if you don’t control the budget on which the entire business case of the application is based, you won’t have the data to make the decision.
 

Lower budget consumption 
 

This situation is of course preferable. The business case of an application is “better than planned”, which you can use to reallocate the budget in another direction – to other applications, to develop new features or to anything else where you overdraw the budget.

But again – you need to have the tools to work with and evaluate this information. Otherwise, they will always be imprecise estimates with a greater or lesser degree of inaccuracy.

So how do we monitor the budget?
 

Tools that monitor cloud finances
 

AWS Budgets 

In an AWS environment, it is easiest to use AWS Budgets to easily define what you want to monitor and how you want to monitor it.

If your budget is overdrawn (either current costs or those expected at the end of the month), you will be notified immediately.

The disadvantage of AWS Budgets is that it only reports the overdrawn budget, not the situation when you do not use the budget in a significant way. At that moment, the second instrument enters the scene.

 

AWS Costs Explorer 
 

AWS Costs Explorer provides you with a detailed analysis of your spending in the cloud.

This brings us back to cloud financial processes, where you should have a clear definition of who uses AWS Costs Explorer and how often. I usually recommend that customers designate a responsible person to work with this tool on at least a quarterly basis (ideally monthly).

AWS Costs Explorer is a powerful tool that allows you to filter and display the data you want in a variety of ways. Interested in looking through a specific app? You can filter it based on a specific tag. Interested in looking through the different sources used? Or, for example, through the environment? You can do all this in a relatively simple way.

 

The output from the AWS Costs Explorer, which nicely shows that in December 2023 there is a significant increase in consumption of Relational Database Service (i.e. platform database) and Certificate Manager.

So, we have a budget and we can follow it. Now comes the second part, the optimization or more technical part.

 

How to optimise your cloud finances
 

Let’s take a look at five areas that can help you optimize costs of your cloud environment. This is not a list that is suitable for everyone but some of the points will be applicable in almost any environment.

1) Pricing models 

The first area is the application of different pricing models. In AWS, we basically have two options: pay as you go and reserved. In principle, lower flexibility comes at a lower price. I don’t want to go into great detail (the area of resource reservation is quite complicated), but the basic division is as follows:

  • Reserved Instances (either standard or convertible),
  • EC2 Instance Savings Plan,
  • Compute Saving Plan.

It would take a separate article to discuss which method is more suitable. I consider it important to be able to distinguish between what is appropriate to operate in PAYG model and where it makes sense to reserve resources (by any type of reservation).

You will find examples of resources in your environment that you know you will be running for at least another year – these are suitable for reserving capacity.

 

2) Perceived vs. real needs 
 

The second area I recommend focusing on is the question of what you really need vs. what you think you need.

You need 4 CPUs and 16 GB of RAM. Why? “Because we always do,” is the typical wrong answer.

In the cloud, you have to run what you actually need. Your drive is not supposed to be 500 GB because “you don’t want to deal with it”. Your disk should be the size (with a reasonable margin) that you need. Even your server should be as big as you need it to be.

Of course, this requires monitoring, because if you don’t know the actual use of existing resources, you can’t even know whether you need them or not.

So monitor your sources and adjust them accordingly. The unwritten rule says, “If you don’t know how much, you’d better have less.” Increasing resources is easy, decreasing (in some cases) resources is a bit more difficult.

 

3) Operation of non-productive environments 
 

I dare to say that some of the non-production environments do not need to run at all, or it is enough if they are available during working hours.

A nice illustration is the Pay as you Go model. If you run a 24×7 environment, that’s an average of 720 hours per month. If you only run the same environment during the 7am to 7pm work week, that’s only 240 hours per month on average. A fine example of a 66% saving.

The reality is that most environments may not even be available during the day, but only when needed.

So how to automate the “switching on” and “switching off” of the environment? A quick solution may be to use AWS Step Functions, where you can quickly drag & drop a workflow to turn the environment on or off.

It might look like this:

4) Technical review of services and infrastructure 
 

I recommend reviewing services and infrastructure at least once a year or in the event of a major application change. The purpose of the review is to evaluate the state of the used components, or to replace them with others (which were not available or could not be used at the time of application implementation).

One of the things that we recommend to go along with this process is a more comprehensive AWS Well-Architected review that will reveal your application’s weaknesses.
 

Example 1: In its initial version, the app only supported storing data locally in the Amazon Elastic Block Store. However, the new version already supports saving to the Amazon S3 bucket. EBS is much more expensive than S3.

Example 2: The application did not support scaling in the beginning, and you had two large EC2 instances running continuously. Today, however, it is possible to dynamically scale the number of EC2 instances according to the actual load.

Example 3: AWS has come out with a new service or an updated version of an existing service. A beautiful example is T2 vs. T3a instances. If you create a new EC2, T2 instances are still “default”. But the new T3a instances have higher performance and lower price, so maybe it makes sense to change all T2 instances to T3a. A saving of 20% is not entirely wasted. (Of course, this applies to all instance types. AWS is continually introducing new lines and it is worth keeping track of this.)

It is exactly the same when using the Elastic Block Store (EBS). The default is still the gp2 variant, but the gp3 (both generic SSD variants) delivers higher performance and a lower price. The result is again a saving of 20%.

5) Every decision matters 

In AWS, you always pay for everything. And some of your decisions can have a significant impact on the price.

 

It is therefore most advisable to be aware of these things. Or consult us. We’re happy to help you with anything, whether it’s infrastructure upgrades, optimizations or application migrations.