Prometheus Instant Queries For Nomad Autoscaling
Understanding the Issue: Prometheus Range Queries and Nomad Autoscaler
When using the Nomad autoscaler with Prometheus, a common challenge arises from how the autoscaler interacts with Prometheus queries. The core issue revolves around the autoscaler's reliance on range-based queries, which return multiple data points over a specified time window, rather than a single, instantaneous value. This discrepancy can lead to complexities in configuring scaling policies accurately.
Currently, the Nomad autoscaler primarily supports windowed or range-based queries. This means that when you define a check, the autoscaler sends a request to Prometheus that includes a start and end time, essentially asking for a time series of data points. For instance, if you want to scale your service based on the average CPU usage over the last minute, the autoscaler fetches multiple values representing CPU usage within that minute.
The problem arises because the autoscaler's default behavior is to fetch data points at a fixed interval, often one second. This can result in a large number of data points, even if the underlying metrics are not as frequent. For example, if your CPU metrics are exported every 15 seconds, Prometheus might return 60 values for a one-minute window, but only four of those values are actually distinct. This granularity mismatch can make it difficult to set appropriate thresholds for scaling decisions.
Consider the scenario where you want to scale up your service when the average CPU usage exceeds 80% over the last minute. Using the provided example:
check "avg_cpu_up" {
source = "prometheus"
query = "avg(nomad_alloc_cpu_usage{job=\"myservice\"}) * 100"
query_window = "1m"
group = "avg_cpu"
strategy "threshold" {
lower_bound = 80
delta = 1
}
}
This configuration translates into the autoscaler querying Prometheus for CPU usage data over the last minute. However, the crucial part is the within_bounds_trigger property, which determines how many of the returned data points must meet the threshold for scaling to occur. Setting this value becomes problematic because the number of data points returned depends on the query window and the step size used by the autoscaler.
If the within_bounds_trigger is set too low, the service might scale up prematurely based on transient spikes in CPU usage. Conversely, if it's set too high, the service might not scale up even when sustained high CPU usage warrants it. Determining the correct value requires a deep understanding of the metric export rate and the autoscaler's internal workings, making it a fragile and unintuitive configuration.
Furthermore, the target-value strategy suffers from similar issues, as it often relies on the last value returned in the time series, which may not be representative of the overall trend. The documentation suggests that the query should return a single value, yet range queries inherently return multiple values, creating a disconnect between the intended behavior and the actual implementation. This discrepancy highlights the need for a more straightforward approach to querying Prometheus for autoscaling purposes.
Proposed Solution: Leveraging Prometheus Instant Queries
To address the challenges associated with range-based queries, the introduction of instant queries offers a more intuitive and robust solution. Instant queries, as the name suggests, return a single value representing the metric at a specific point in time. By utilizing instant queries, the autoscaler can make scaling decisions based on a clear, unambiguous metric, simplifying the configuration and improving the reliability of the scaling process.
The suggested solution involves modifying the check configuration to use a Prometheus query that aggregates the data over the desired time window within the query itself. This can be achieved using functions like avg_over_time, which calculates the average value of a metric over a specified period. By incorporating this function into the query, the autoscaler receives a single value representing the average CPU usage over the last minute, eliminating the need to grapple with multiple data points and the within_bounds_trigger property.
Here's an example of how an instant query can be implemented:
check "avg_cpu_up" {
source = "prometheus"
query = "avg(avg_over_time(nomad_alloc_cpu_usage{job=\"myservice\"}[1m])) * 100"
query_window = "0" # or query_instant = true
group = "avg_cpu"
strategy "threshold" {
lower_bound = 80
delta = 1
within_bounds_trigger = 1
}
}
In this configuration, the avg_over_time function calculates the average CPU usage over the last minute, and the outer avg function ensures that the result is a single value. The query_window is set to "0" (or query_instant = true could be used) to indicate that an instant query is desired. The within_bounds_trigger is set to 1, as only a single value is returned.
The key advantage of this approach is that it decouples the scaling policy from the underlying metric export rate and the autoscaler's internal step size. Regardless of how frequently the metrics are exported or how the autoscaler fetches data, the check definition remains consistent and reliable. This makes the configuration more maintainable and less prone to errors.
While the use of avg(avg_over_time(...)) might seem like an average of averages, it provides a reasonable approximation of the overall CPU usage trend. The benefit of simplicity and reliability outweighs the minor inaccuracies introduced by this approach. By adopting instant queries, the Nomad autoscaler can provide a more predictable and efficient scaling experience.
Alternatives Considered
Several alternative approaches were considered before proposing the use of instant queries. Each of these alternatives has its own drawbacks and limitations.
-
Adjusting
within_bounds_triggerManually:One option is to calculate the
within_bounds_triggervalue based on the query window and the metric export rate. This involves determining the number of data points returned by Prometheus and setting thewithin_bounds_triggeraccordingly. However, this approach is fragile and prone to errors, as it relies on assumptions about the metric export rate and the autoscaler's internal step size. Any changes to these parameters would require recalculating thewithin_bounds_triggervalue, making the configuration difficult to maintain. -
Accepting Approximate Scaling:
Another option is to leave the
within_bounds_triggerat its default value and accept that the scaling decisions might not be perfectly accurate. This approach can lead to sporadic scaling events or flapping, as the service might scale up or down based on transient spikes in CPU usage. While this might be acceptable in some cases, it is not ideal for critical applications where precise scaling is required. -
Automating HCL Generation:
If the job HCL files are generated automatically, it might be possible to calculate the
within_bounds_triggervalue programmatically. This would involve parsing the query window and the metric export rate and generating the appropriatewithin_bounds_triggervalue. However, this approach is still fragile and depends on the accuracy of the input parameters. It also adds complexity to the HCL generation process. -
Mimicking Instant Queries with a 1-Second Window:
A workaround is to set the
query_windowto 1 second, which effectively mimics an instant query. This results in a single data point being returned, simplifying the configuration. However, this approach feels like a hack and relies on the autoscaler's internal step size being fixed at 1 second. If the autoscaler's behavior changes in the future, this workaround might no longer work.check "avg_cpu_up" { source = "prometheus" query = "avg(avg_over_time(nomad_alloc_cpu_usage{job=\"myservice\"}[1m])) * 100" query_window = "1s" group = "avg_cpu" strategy "threshold" { lower_bound = 80 delta = 1 within_bounds_trigger = 1 } }While this approach provides a single result value, it feels like a workaround and depends on the autoscaler's hard-coded window size of 1 second. This lack of transparency and reliance on implementation details makes it a less desirable solution compared to true instant queries.
Conclusion: Embracing Instant Queries for Reliable Autoscaling
In conclusion, the introduction of instant queries in the Nomad autoscaler offers a more reliable, intuitive, and maintainable solution for scaling services based on Prometheus metrics. By utilizing functions like avg_over_time within the query itself, the autoscaler can receive a single value representing the aggregated metric over the desired time window. This eliminates the complexities associated with range-based queries and simplifies the configuration of scaling policies.
While alternative approaches exist, they often involve fragile assumptions, manual calculations, or workarounds that are prone to errors. Instant queries provide a clean and consistent way to define scaling policies, decoupling them from the underlying metric export rate and the autoscaler's internal workings.
By embracing instant queries, the Nomad autoscaler can provide a more predictable and efficient scaling experience, ensuring that services are scaled appropriately based on their actual resource utilization. The use of instant queries promotes clarity and maintainability, making it easier to manage and troubleshoot scaling policies over time.
For more information on Prometheus and its query language, refer to the official Prometheus documentation on their website (Prometheus Documentation). This resource provides comprehensive details on Prometheus queries, functions, and best practices.