Xarray: `add_bounds` Creates Broken Dataset - Fixed!
Since version 2025.11.0, xarray's default behavior for keep_attrs has changed to True. While this update aims to preserve metadata, it inadvertently introduces a critical issue when using ds.cf.add_bounds(...) in conjunction with cf-xarray. Specifically, the attributes from the coordinate are incorrectly copied over to the newly created bounds, leading to the bounds being falsely detected as coordinates. This article explains the problem, demonstrates the issue with a practical example, and discusses how to resolve it.
Understanding the Problem with keep_attrs
The keep_attrs parameter in xarray controls whether attributes from a DataArray or Dataset are retained during operations. When set to True, which is now the default, attributes are preserved. While this is generally desirable, problems arise when creating coordinate bounds using cf-xarray's add_bounds function. The attributes of the original coordinate (e.g., longitude) are copied to the newly created bounds (e.g., lon_bounds). This duplication causes cf-xarray to misinterpret the bounds as coordinates, leading to incorrect metadata and potentially broken workflows. This unexpected behavior undermines the integrity of the dataset and can lead to errors in subsequent analyses that rely on coordinate metadata. The primary issue stems from the fact that coordinate bounds should not be treated as independent coordinates themselves. They are auxiliary variables that define the extent or boundaries of the coordinate values. Copying the attributes from the original coordinate to the bounds effectively promotes the bounds to coordinate status, which is semantically incorrect. Therefore, careful management of attributes during the creation of coordinate bounds is essential to maintain the integrity and usability of xarray datasets.
Illustrative Example: Air Temperature Dataset
To illustrate this issue, let's consider a practical example using the air temperature dataset from xarray's tutorial. We'll start by loading the dataset and then use cf.add_bounds to create longitude bounds:
import cf_xarray as cfxr
import xarray as xr
ds = xr.tutorial.open_dataset('air_temperature')
dsb = ds.cf.add_bounds('longitude')
Before adding bounds, the ds.cf structure correctly identifies the coordinates:
Coordinates:
CF Axes: * X: ['lon']
* Y: ['lat']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['lon']
* latitude: ['lat']
* time: ['time']
vertical: n/a
[...]
However, after adding bounds, the dsb.cf structure incorrectly includes lon_bounds as a coordinate:
Coordinates:
CF Axes: X: ['lon', 'lon_bounds']
* Y: ['lat']
* T: ['time']
Z: n/a
CF Coordinates: longitude: ['lon', 'lon_bounds']
* latitude: ['lat']
* time: ['time']
vertical: n/a
As you can see, lon_bounds should not be listed as a CF Coordinate. This misidentification can lead to unexpected behavior and errors in downstream analysis. The core of the problem lies in the automatic copying of attributes from the longitude coordinate to the lon_bounds variable. These attributes, which define longitude as a CF Coordinate, are inappropriately transferred to lon_bounds, causing cf-xarray to incorrectly recognize it as a coordinate as well. This highlights the necessity for a mechanism to prevent the automatic inheritance of coordinate-defining attributes when creating bounds.
Proposed Solution: Preventing Attribute Copying
To address this issue, a potential solution involves preventing the automatic copying of attributes from the original coordinate to the newly created bounds. This can be achieved by modifying the add_bounds function in cf-xarray to explicitly set keep_attrs=False when creating the bounds. By default, keep_attrs=True, so this change would ensure that the bounds are created without inheriting the coordinate attributes. An alternative approach would be to selectively remove coordinate-defining attributes from the bounds after creation. This could involve identifying specific attributes that cause the misidentification and removing them from the bounds variable. However, the former approach is generally more straightforward and less prone to errors. By preventing the initial copying of attributes, we avoid the need for subsequent modification and ensure that the bounds are created with the correct metadata from the outset. This targeted modification preserves the integrity of the dataset and prevents unintended consequences in downstream analysis. By implementing this fix, cf-xarray can continue to provide accurate and reliable coordinate handling, even with xarray's new default for keep_attrs.
Implementing the Fix: A Pull Request Suggestion
A pull request (PR) to address this issue is highly recommended. The PR should modify the add_bounds function within cf-xarray to ensure that the keep_attrs parameter is set to False when creating the bounds. This will prevent the incorrect copying of attributes and resolve the problem of bounds being misidentified as coordinates. The PR should also include comprehensive tests to verify that the fix works as expected and does not introduce any new issues. These tests should cover various scenarios, including different coordinate types and dataset structures. Additionally, the PR should include clear and concise documentation explaining the issue and the solution implemented. This documentation will help other developers understand the changes and ensure that the fix is properly maintained in the future. By submitting a well-documented and thoroughly tested PR, we can ensure that this issue is resolved effectively and that cf-xarray continues to provide accurate and reliable coordinate handling for xarray datasets. The collaborative effort of the open-source community is essential for maintaining the quality and usability of these tools.
Benefits of the Fix
Implementing this fix offers several significant benefits. Primarily, it ensures that cf-xarray correctly identifies coordinates and bounds, preventing misinterpretations and errors in downstream analysis. This leads to more reliable and accurate results, especially in applications that rely on coordinate metadata. Furthermore, the fix maintains the integrity of the dataset by preventing the unintended modification of coordinate attributes. This is crucial for preserving the semantic correctness of the data and ensuring that it can be used effectively in various scientific workflows. Additionally, the fix simplifies the process of creating coordinate bounds by eliminating the need for manual attribute manipulation. Users can confidently use the add_bounds function without worrying about the potential for incorrect attribute copying. By addressing this issue, we enhance the usability and reliability of cf-xarray, making it a more valuable tool for the scientific community. The improved accuracy and ease of use will ultimately contribute to more efficient and effective data analysis.
Conclusion
The change in xarray's default keep_attrs to True introduced an unexpected issue when creating coordinate bounds with cf-xarray. The attributes from the coordinate were incorrectly copied over to the created bounds, causing the new bounds to be detected as a coordinate, breaking cf-xarray. A fix involving setting keep_attrs=False during the creation of bounds within the add_bounds function is recommended. A pull request implementing this change would greatly benefit the cf-xarray community.
For more information on xarray attributes, visit the xarray documentation.