One approach is to deploy tooling that will report cloud costs, review outputs regularly (once a month to once a week) and implement changes based on these reports.
To implement this correctly, we must rely on something other than manual processes; we cannot hope that people will remember to tag their resources correctly or that we will be able to pre-empt our utilisation for the next 12 months accurately.
The cloud moves quickly and people simply can’t keep up; from start-ups to enterprise estates, things scale and change way faster than we can predict or act. This is why FinOps needs to form part of the delivery process, just as much as security.
Modern organisations automate their security. CSPs have made this fairly simple by providing tools to assist in doing so, and the same is true for FinOps. Adding automated cost management into software and infrastructure pipelines shifts FinOps left, minimising engineering impact and reducing cognitive load.
Everything you do in the cloud has a cost, from bandwidth to API calls to virtual entities like secrets. Then there are factors that organisations cannot control with the weekly review — how often have we seen messages online like “because of X, I’ve received a huge cloud bill that’s much larger than I expected”?
Sometimes the cloud provider will help and offer a discount, but that’s not always the case. This is when automation can help. Organisations can and should configure their cloud to send alerts when their bill exceeds a fixed limit, but even that should not be a one-off.
You might think FinOps is just about monitoring your production environment and applying corrective actions. But FinOps should start as close as possible to your development environment.
Here’s one example. Imagine a developer changing a line of code and triggering double the cloud API calls for the same action. This might not seem like a big deal. However, if we allow that change into production where that API is triggered hundreds of thousands of times ‘on Friday’ to generate weekly reports then… your weekend will be spent sifting through a mountain of identical reports!
In general, what is desirable is the shortest possible feedback loop between the cloud environment and the engineering team. Having something as simple as an alert saying, “You just doubled your cost for this test case” might have saved you from this incident.
Applying weights to test cases measuring the cost impact of the same API call in production is an excellent preemptive system.
In the same way that you monitor your system for critical metrics (e.g. latency) and have your engineering team exposed to them, so they can spot changes right after deployments, you want your team exposed to FinOps metrics continuously. And you must allow them to respond appropriately when code/infra changes trigger unexpected events.