Modern data warehouses have a problem. It’s been growing for a while, but in recent years has become impossible for organizations—of all stripes—to ignore.
Although the exact number is hard to pin down, there are terabytes of data created each year across the globe. The Internet of Things (IoT), growth of synthetic data, and increased use of digital products during the pandemic, and other factors have only accelerated this exponential trend.
As a result, modern data warehouses operate in more complex and more dynamic environments than even a decade ago. And with complexity and scale come two major challenges. First, costs are higher. Second, processing larger data sets significantly slows performance.
Back in the day, when everything was on-prem, the solution was simple: get a database administrator to adjust warehouses to their workloads, ensuring adequate provisioning without overspending on unused resources.
But manual optimization has quickly become unfeasible. In fast-paced cloud environments, warehouse workloads are constantly in flux. A scaled-down warehouse may suddenly need more resources to handle a spike in demand, but only for ten minutes.
This means manual warehouse optimization requires a DBA to watch all those warehouses like a hawk, making real-time changes and then reverting those changes once query activity goes back to normal. Multiply that effort across hundreds or even thousands of warehouses, each of which have their own unique workloads, demands, and ongoing changes.
Obviously, this doesn’t happen in real life. Which means we’re back to a bunch of warehouses that are either too costly or they underperform.
Our founder was tired of sitting on the sidelines and watching this happen. He knew that there had to be a better way.
That better way is what we now call Keebo.
How does Keebo work?
In a sentence, Keebo automatically optimizes Snowflake warehouses in real time through AI-powered algorithms. Here are the three approaches we use to accomplish this objective.
Approach #1: Predictive resource management
Predictive resource management in Keebo utilizes machine learning (ML) to identify trends in Snowflake’s historical data and forecast future demands. This involves analyzing:
- Activity logs
- Performance metrics
- Previous optimizations (& their success/failure)
- Usage patterns
Keebo accomplishes this through the use of reinforcement learning, an AI technique that mimics how humans learn over time more closely than traditional supervised learning. RL algorithms have a set of actions and states, where each action takes the agent from one state to another. There is also a a so-called reward function that captures whether the outcome is desirable or not. In other words, the RL algorithm rewards the agent whenever it takes an action that leads to a desirable outcome, and penalizes it whenever it leads to an undesirable outcome. For example, in Keebo’s context, the RL algorithms reward actions that save money for the customer but penalizes actions that cause a slowdown. Additionally, Keebo deploys many advanced, patented algorithms to engage in continuous learning and adaptation.
This enables us to dynamically adjust resource allocations based on both predicted and real-time needs. For example, let’s say you have a Large Snowflake warehouse that can handle its current workload even if it is downsized to Medium. But a sudden spike in usage requires you to scale it back up to its default size, i.e., Large. If you don’t make that change quickly, your performance will lag. But if you keep the warehouse at a Large for too long, you end up spending money on resources you don’t need.
By analyzing historical data and monitoring real-time load, Keebo can predict with significant accuracy when your warehouses will require more provisioned resources and react whenever those assumptions are no longer true. Then, Keebo automatically makes the necessary changes in real time—anytime, anywhere.
Approach #2: Performance guardrails
Most of Keebo’s real-time cost optimizations result in no or negligible reductions in warehouse performance. However, there are the occasional exceptions to that rule. In those cases—especially when user demands and even formal SLAs require adherence to strict performance standards—your automated optimizations need guardrails in place.
To ensure reliable performance at all times, Keebo utilizes Performance Guardrails as a core part of our automated cost optimizations.
Because not all data warehouses are created equal, Keebo Performance Guardrails include flexible parameters for latency, queue time, and number of queued queries to prevent performance degradation.
If a warehouse slips below (or above) the set value for each of those metrics, Keebo will back off its optimizations and revert the warehouse to its default state, providing ample compute resources to handle its workload.
Specifically, Keebo offers Performance Guardrails around the following metrics:
- Total queue time
- Number of queued queries
- Query latency—95th percentile, where Keebo backs off when 5% of queries do not meet the desired latency parameters
- Query latency—99th percentile, where Keebo backs off when 1% of queries do not meet the desired latency parameters
- Query latency—max latency, where Keebo backs off when a single query exceeds the latency parameters
In complex, highly fluid cloud environments, Keebo Performance Guardrails offer control over how you balance cost optimization and performance. This allows you to let AI optimize without fear of unintended consequences.
Approach #3: Metadata only
Finally, Keebo is secure. Not only are we 100% SOC2 and GDPR compliant, but the fundamental structure of our platform removes virtually all risk of data loss or breach.
We do this by using zero customer data in our AI optimization models.
Instead, we only use Snowflake metadata. For those unfamiliar, metadata is the “data about data.” It includes elements such as performance telemetry, logs of query execution times, usage statistics, and more. In other words, metadata tells us how you used Snowflake, not what you used Snowflake to retrieve.
By aggregating and analyzing historical and real-time metadata, we’re able to gain insight into usage patterns, trends, peak usage times, and other information that helps us anticipate future and current demands. And we’re able to do that without needing access to sensitive user data, which means you never have to worry about it falling into the wrong hands.
Final thoughts on Keebo’s approach to cloud data warehouse optimization
Automating your warehouse operations in a secure and controllable manner is the most efficient way to maximize performance and minimize costs. Because this happens 24/7, you never miss an opportunity to save money—Keebo works even when your data team is all asleep.
With those savings, you can redirect your efforts to high priority tasks, more expensive workloads, and ongoing business innovation. Ultimately, Keebo doesn’t just save you time and money—it unlocks increased competitiveness and success.
Stop leaving money on the table. Start optimizing your warehouses with Keebo today. Schedule a demo to find out how.