Emergent Mind

Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

(2406.16947)
Published Jun 19, 2024 in cs.LG and physics.ao-ph

Abstract

Data assimilation of observational data into full atmospheric states is essential for weather forecast model initialization. Recently, methods for deep generative data assimilation have been proposed which allow for using new input data without retraining the model. They could also dramatically accelerate the costly data assimilation process used in operational regional weather models. Here, in a central US testbed, we demonstrate the viability of score-based data assimilation in the context of realistically complex km-scale weather. We train an unconditional diffusion model to generate snapshots of a state-of-the-art km-scale analysis product, the High Resolution Rapid Refresh. Then, using score-based data assimilation to incorporate sparse weather station data, the model produces maps of precipitation and surface winds. The generated fields display physically plausible structures, such as gust fronts, and sensitivity tests confirm learnt physics through multivariate relationships. Preliminary skill analysis shows the approach already outperforms a naive baseline of the High-Resolution Rapid Refresh system itself. By incorporating observations from 40 weather stations, 10\% lower RMSEs on left-out stations are attained. Despite some lingering imperfections such as insufficiently disperse ensemble DA estimates, we find the results overall an encouraging proof of concept, and the first at km-scale. It is a ripe time to explore extensions that combine increasingly ambitious regional state generators with an increasing set of in situ, ground-based, and satellite remote sensing data streams.

Dependence of assimilation on station density, evaluated by RMSE from varying station numbers for 2017 data.

Overview

  • The paper introduces a deep generative model approach for data assimilation in weather forecasting, successfully reconstructing full atmospheric states at kilometer scales using sparse weather observation data.

  • The methodology hinges on Score-based Data Assimilation (SDA) and a diffusion model surrogate, which is trained on HRRR analysis data and iteratively updated to integrate sparse observations without retraining.

  • Extensive experiments demonstrate the framework's robustness and efficacy, achieving lower RMSE and MAE metrics compared to traditional methods and effectively learning physical relationships among atmospheric variables.

Generative Data Assimilation of Sparse Weather Station Observations at Kilometer Scales

Introduction

The paper presents an application of a deep generative model to the data assimilation issue in weather forecasting, using sparse weather observation data to reconstruct full atmospheric states at a kilometer scale. The authors propose the use of a diffusion model surrogate for this task, introducing a new and potentially more efficient methodology for generating high-resolution atmospheric states from sparse data. This approach is particularly notable for its simplicity and scalability, as well as its ability to incorporate new observations without the need for model retraining.

Methodology

The framework used by the authors leverages Score-based Data Assimilation (SDA), applied to a large-scale weather dataset. The core methodology can be broken down into two main components:

Diffusion Model Training:

  • The diffusion model is trained on HRRR (High Resolution Rapid Refresh) analysis data, which provides high-resolution atmospheric states.
  • The model focuses on generating 3km-resolution surface fields of atmospheric variables such as wind speed (both zonal and meridional components) and precipitation.

Score-based Data Assimilation:

  • The trained diffusion model is utilized during data assimilation to incorporate sparse weather station observations.
  • The model is updated iteratively using the noise-added state (x) to a denoised state (\hat{x}), guided by the difference between simulated observations and actual weather station data.
  • This iterative updating process is informed by the learned score function, which helps in reconstructing physically plausible atmospheric states.

Materials and Methods

The authors employ the HRRR dataset and NOAA Integrated Surface Database (ISD) for their experiments. Specifically, they focus on a region encompassing Oklahoma and its adjacent areas, known for its dense network of weather stations and its dynamic weather patterns, including convective precipitation.

HRRR Data:

  • The HRRR is a cloud-resolving atmospheric model running at a 3km resolution.
  • The authors extract variables such as 10m wind speeds and total precipitation.

NOAA ISD Data:

  • This dataset comprises hourly data from more than 14,000 weather stations globally.
  • For their studies, the authors use wind and precipitation data from stations within the chosen region.

Results

Data Assimilation with Pseudo-Observations

The authors first validate their methodology by assimilating increasingly sparse pseudo-observations extracted from the HRRR dataset. Their experiments reveal that even when the data is highly subsampled, the SDA framework can produce atmospheric states that are qualitatively similar to the original HRRR data.

Assimilation of Real Observations

When applied to actual weather station data, the results are robust:

  • With around 20 stations, the SDA framework outperforms the HRRR analysis in reconstructing wind variables when evaluated on left-out stations.
  • Utilizing more stations leads to further reductions in RMSE metrics, which indicates the strong potential of SDA for practical applications.
  • The generated states display physically plausible structures, such as gust fronts, which support the hypothesis that the model has learned meaningful physical relationships from the data.

Learned Physics

The paper also explores the model's ability to generate plausible states for held-out variables. For example, by leaving out the meridional wind component and using zonal wind and precipitation data only, the model successfully reconstructs realistic meridional wind fields. This observation underscores the model’s ability to exploit learned physical relationships among different state variables.

Quantitative Evaluation

The performance of SDA is also quantitatively evaluated by comparing the RMSE, MAE, and CRPS (Continuous Ranked Probability Score) against the HRRR analysis. The results indicate that:

  • SDA provides lower RMSE and MAE compared to HRRR, particularly for wind variables.
  • The CRPS for SDA assimilated ensembles also highlights an encouraging level of skill.

Discussion

The study demonstrates significant advancements in generative data assimilation using sparse observations:

  • Practical Implications: The proposed SDA methodology is simple and scalable, admitting flexible incorporation of new observation streams. This offers potential cost and latency advantages over traditional data assimilation systems.
  • Theoretical Implications: The findings suggest that deep generative models can effectively learn and reproduce physical relationships in atmospheric data, paving the way for more sophisticated and higher-dimensional applications.

Future Directions

The authors note several avenues for future research:

  • Incorporating additional types of observational data (e.g., satellite and radar data) to improve the robustness and accuracy of the model.
  • Exploring more complex observation operators and the inclusion of temporal dynamics for enhanced state estimation.
  • Further investigation into ensemble spread calibration to address under-dispersiveness and improve the reliability of probabilistic forecasts.

Conclusion

The research presented in this paper marks an important step towards using deep generative models for high-resolution weather data assimilation. By pragmatically addressing the complexities of incorporating sparse and varied observational data, the proposed SDA framework shows promise in enhancing both the efficiency and accuracy of atmospheric state reconstructions at kilometer scales.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.