Falco Sidekick Panics With K8s.cluster.name: Fix It!
Are you experiencing panics in Falco Sidekick when specifying k8s.cluster.name in your custom fields? You're not alone! This article dives deep into this issue, providing a detailed explanation of the bug, steps to reproduce it, expected behavior, and a comprehensive analysis of the environment where this problem occurs. If you're struggling with this issue, you've come to the right place. Let's get started and resolve this together.
Understanding the Falco Sidekick Panic
When working with Falco and Falco Sidekick, incorporating Kubernetes cluster names into your alerts can significantly enhance your monitoring and security posture. The k8s.cluster.name custom field is particularly useful for distinguishing events originating from different clusters, especially in multi-cluster environments. However, specifying this custom field has been known to cause panics in Falco Sidekick, leading to disruptions in alert delivery and overall system stability. In this section, we will delve deeper into the specifics of this bug and its impact.
The core issue arises when you attempt to add k8s.cluster.name as a custom field, following recommendations to enrich Falco alerts with cluster-specific information. As detailed in the bug report, Falco Sidekick throws an error, stating, 'Custom field 'k8s.cluster.name' is not a valid OTLP metric attribute name.' This initial error indicates a problem with how Falco Sidekick is processing this particular custom field, specifically in the context of OpenTelemetry Protocol (OTLP) metrics. This suggests an incompatibility or a validation issue within the metric processing pipeline of Falco Sidekick.
Following the initial error, Falco Sidekick proceeds to panic, leading to a cascade of HTTP server errors. The error messages reveal an inconsistent label cardinality issue, specifically mentioning that the system expected 8 label values but got 7. This discrepancy highlights a deeper problem related to how Prometheus metrics are being generated and handled within Falco Sidekick. The addition of a custom field like k8s.cluster.name seems to alter the expected number of labels, causing the metric generation process to fail.
The panic manifests as a series of http: panic serving errors, each associated with a different client IP address and port. These errors indicate that the HTTP server within Falco Sidekick is crashing while attempting to serve incoming requests. The goroutine traces included in the error logs point to specific lines of code within the prometheus/client_golang library and Falco Sidekick's handler functions, pinpointing the exact locations where the panics occur. Specifically, the github.com/prometheus/client_golang/prometheus.(*CounterVec).With function and the main.newFalcoPayload function in handlers.go are implicated, suggesting issues in the metric counting and payload handling logic.
The impact of this bug is substantial. Falco Sidekick is designed to forward Falco alerts to various destinations, such as Webhooks, Slack, and more. When Falco Sidekick panics, it disrupts the entire alert forwarding mechanism, leading to potential security blind spots. Critical alerts about runtime security events may not be delivered, leaving systems vulnerable to ongoing attacks. Moreover, the continuous panics can lead to resource exhaustion and instability within the Kubernetes cluster, affecting other applications and services running alongside Falco Sidekick. Therefore, understanding and resolving this issue is crucial for maintaining a robust security and monitoring infrastructure.
Reproducing the Falco Sidekick Panic
To effectively address a bug, it's essential to reproduce it consistently. This section provides a step-by-step guide on how to reproduce the Falco Sidekick panic when specifying k8s.cluster.name in custom fields. By following these steps, you can verify the issue in your environment and confirm that the fix resolves the problem.
-
Prerequisites: Ensure you have a Kubernetes cluster up and running. You should also have Helm installed, as it's the primary method for deploying Falco and Falco Sidekick in this scenario.
-
Create a
values.yamlFile: Thevalues.yamlfile is used to configure the Falco Helm chart. Create a newvalues.yamlfile with the following content. This configuration enables Falco Sidekick, sets thefullfqdnto true, enables debug mode, and, most importantly, addsk8s.cluster.nameas a custom field:falcosidekick: enabled: true fullfqdn: true config: debug: true customfields: "k8s.cluster.name:my-k8s-cluster"In this file, the
customfieldsparameter is set tok8s.cluster.name:my-k8s-cluster, which is the key element in triggering the panic. Themy-k8s-clustervalue is an example; you can use any string for your cluster name. -
Deploy Falco with the Custom Configuration: Use Helm to deploy Falco with the
values.yamlfile you created. Navigate to the directory containing yourvalues.yamlfile and run the following Helm command:helm install falco falcosecurity/falco -f values.yamlThis command installs Falco into your Kubernetes cluster using the configuration specified in
values.yaml. Thefalcosecurity/falcochart is fetched from the official Falco Helm chart repository. Make sure you have added the Falco chart repository to your Helm repositories if you haven't already:helm repo add falcosecurity https://falcosecurity.github.io/charts helm repo update -
Check Falco Sidekick Logs: After deploying Falco, check the logs of the Falco Sidekick pod. You should see the panic messages and error logs that indicate the issue. First, find the name of the Falco Sidekick pod:
kubectl get pods -n default | grep falco-sidekickReplace
defaultwith the namespace where you deployed Falco if it's different. Then, use the pod name to view the logs:kubectl logs -n default <falco-sidekick-pod-name>In the logs, you should observe the following error messages:
2025/11/26 22:53:25 [ERROR] : Custom field 'k8s.cluster.name' is not a valid OTLP metric attribute name ... 2025/11/26 22:53:38 http: panic serving 10.13.78.36:38474: inconsistent label cardinality: expected 8 label values but got 7 in prometheus.Labels...These logs confirm that Falco Sidekick is panicking due to the
k8s.cluster.namecustom field. -
Verify the Fix: To verify that removing the custom field resolves the issue, edit the
values.yamlfile and comment out or remove thecustomfieldsline:falcosidekick: enabled: true fullfqdn: true config: debug: true # customfields: "k8s.cluster.name:my-k8s-cluster" # Removing this lineUpgrade the Falco deployment with the modified
values.yaml:helm upgrade falco falcosecurity/falco -f values.yamlCheck the Falco Sidekick logs again. The panic messages should no longer appear, indicating that the issue is resolved.
By following these steps, you can consistently reproduce the Falco Sidekick panic and verify the effectiveness of any proposed solutions. This systematic approach is crucial for bug identification and resolution in complex systems.
Expected Behavior
When configuring Falco Sidekick, the expectation is that custom fields, including k8s.cluster.name, should be correctly processed and included in the event/alert JSON payloads. This behavior is crucial for enriching alerts with contextual information, making them more valuable for security analysis and incident response. Understanding the expected behavior helps in identifying deviations and confirming that a fix is effective.
Ideally, when you add a custom field like k8s.cluster.name to your Falco Sidekick configuration, the following should occur:
-
Successful Startup: Falco Sidekick should start without any errors related to the custom field configuration. The application should parse the
values.yamlfile, load the custom fields, and initialize its components without panicking or throwing exceptions. -
Inclusion in Alert Payloads: When Falco detects a security event and sends an alert to Falco Sidekick, the alert payload should include the custom field with the specified value. For example, if you configure `customfields: