Data assimilation challenges in the age of big Earth data

Sequences of GOES-16 tropospheric water vapor are used to calculate NOAA’s derived motion winds (b; in black). Atmospheric winds help to reduce initial condition uncertainty when assimilated into weather models. Dense feature tracking using deep optical flow provides more complete information about the atmospheric wind speed and direction (c), but produces too much data to be handled by traditional assimilation systems (Vandal et al., 2022).

Weather models and climate models use some of the world’s most powerful computers to solve mathematical equations. In traditional approaches to weather forecasting, observations are assimilated into numerical weather prediction (NWP) models in order to produce the best estimate of the current state of the atmosphere.

While the volume of Earth data collected by public and private-sector satellites is growing rapidly, only a fraction of the information is currently assimilated into NWP models. Data assimilation is limited by several factors:

  • Computing power. NWP models require a significant amount of computing power to run. As the amount of data increases, so does the amount of computing power required to process it. This can lead to slower processing times and delays in the assimilation of new data.
  • Data storage. NWP models require large amounts of data storage to store observations and assimilated data. As the volume of data increases, so does the need for storage space. This can limit the amount of new data that can be assimilated in real-time.
  • Communication bandwidth. Weather models require high-speed communication networks to transfer large amounts of data between data sources, processing centers, and end-users. Limited communication bandwidth can slow down the assimilation process.
  • Model complexity. NWP models are becoming increasingly complex, with more variables and higher resolutions requiring more computational resources. This can limit the amount of data that can be assimilated in real-time, especially when dealing with high-resolution data.

As the quantity of Earth data available continually increases, machine learning has the potential to perform complex predictive tasks more accurately, supplementing or even replacing time-consuming portions of traditional data assimilation. Machine learning algorithms can often identify complex patterns and spatiotemporal relationships that are difficult for traditional data assimilation methods to detect. Due to their efficiency, AI models can process a larger volume of satellite data to capture more information about the atmosphere than traditional numerical models.

Our team is developing assimilation and forecasting techniques that replace computationally expensive physics-based models with data-driven models. Our Low-latency Environmental Prediction from Neural Systems (LENS) ingests global geostationary satellite data to produce weather forecasts in a fraction of the time of leading government models. With continuously improving accuracy, LENS makes the performance and high resolution of regional weather models available at a global scale.


[1] Vandal, T. J., Duffy, K., McCarty, W., Sewnath, A., & Nemani, R. (2022, August). Dense Feature Tracking of Atmospheric Winds with Deep Optical Flow. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1807-1815). pdf