Global weather forecasts depend on petabyte scale datasets and are generated on some of the world’s largest supercomputers. Until now, the resources required have severely limited the number of organizations capable of producing global weather forecasts. Using generative AI, we have developed EarthNet, a multi-modal foundation model for global data assimilation directly from Earth observations with 1000x more efficiency. We use EarthNet to generate a novel 3D atmospheric reanalysis dataset at a 0.16 degree resolution with many more applications awaiting.
Pure AI global weather forecasting models like GraphCast, AIFS, and FourCastNet have enabled more efficient forecasting. However, these AI models depend on initial conditions generated from physics based data assimilation systems operated by the European Centre for Medium-Range Weather Forecasts (ECMWF) and the United States’ National Oceanic and Atmospheric Administration (NOAA).
Observations collected from satellites, weather stations, balloons, ships, aircraft, and more are ingested into traditional physics based models to estimate the current state of the atmosphere. This process, known as data assimilation (DA), is the most computationally expensive part of weather forecasting, taking 3-6 hours to estimate the model state.
The large gap between observation time and forecast availability leads many extreme, high impact weather events to not be captured. While Pure AI forecasts are faster than physics based forecasts, the forward processing steps are the minority computational cost. High frequency DA that is more efficient would provide more timely information for extreme events.
A pure AI data assimilation system must ingest incomplete information from all types of data across space and time and generate a gap-filled analysis ready dataset. EarthNet tackles this problem using Multi-modal Masked Autoencoders capable of gap-filling and forecasting sequences of spatio-temporal data.
Multi-modal AI is a powerful tool for learning contextual representations from a variety of data sources. We have seen this with ChatGPT-4o and Gemini which use vision, language, and speech modalities to enable more user friendly interfaces. Similarly, multi-modal AI can be leveraged to make the most out of Earth observations.
The data used to train the model comes from a variety of publicly accessible sources and includes sounders, imagers, and Level 2 products from low-earth orbiting and geostationary satellites. Each data source is considered as a different modality, with modality-specific embeddings. We have found that increasing the number of data types increases reconstruction accuracy, even across seemingly disparate feature types, such as thermal infrared and surface pressure. In future development, we fully expect that scaling the number and variety of data sources will continue to increase accuracy.
One of the challenges in working with satellite data is the high incidence of missing data between revisits. Masked modeling works well with missing data, as we tokenize data time frame by time frame and ignore missing tokens. In fact, multimodal training is so data-efficient that the model learns best when dropping 98% or more of possible tokens. At inference time, any modality-to-all modality capability of the model produces 100% global coverage of all features – even where it’s never been seen, like geostationary satellite data over the poles.
EarthNet is trained for three weeks on 16 GPUs. 500 TB of data was downloaded and transformed onto a common gridding system to produce a 2 TB training dataset. While the training compute is substantial, inference is highly efficient and up to 1000x faster than the current DA systems.
We evaluated the performance of EarthNet’s hourly 3D atmospheric structure against the current best-in-class reanalysis datasets, ECMWF’s ERA5 and NASA’s MERRA-2. For ground truth, we used data from radiosondes, weather balloons designed to directly measure the atmospheric vertical profile at a single location. EarthNet’s 3D relative humidity predictions outperform MERRA-2 and ERA5 reanalyses by 10-60% in the middle troposphere and lower stratosphere (5 to 20 km altitude) . For all levels of the atmosphere, EarthNet’s temperature predictions are statistically similar to those of a microwave sounder, a sensor modality that has been shown to greatly improve data assimilation accuracy. More results and methodology can be found in our paper. Overall, our analysis shows that EarthNet is comparable to current approaches, with significant potential to improve resolution and accuracy.
One key application of EarthNet is data assimilation. While ML approaches have now outperformed NWP forecasts in many cases, they continue to depend on physics based NWP initial states and suffer from their associated runtime delays. Also, modeling 3D structure is a current challenge area in operational models, where improvements would be transformational to weather forecasting. EarthNet provides a path to get an independent, accurate, initial state sooner.
Another application of EarthNet is in Earth observation processing. An expansion of public and private investment is generating an unprecedented number of observation data streams. However, the processing and access pipelines—across both public and private sectors—are fragmented and populated with single purpose models that must be built and tuned for specific use cases or even specific sensors. We believe the future of Earth data processing is in the general-purpose foundation model. Adding any new feature to EarthNet can improve existing features, while custom models ingesting proprietary data will be informed by the wealth of Earth intelligence already available.
Our future plans include increasing the model resolution from 16 km to 2 km and training the model to perform short term forecasting using the Earth system dynamics that have already been learned. We believe that EarthNet has the potential to unlock transformational benefits from existing observing systems, currently bottlenecked by the limitations of operational DA systems and fragmentation of the Earth observation data landscape. We believe that putting better Earth data in the hands of people and organizations will empower humanity to solve environmental challenges and build a healthy future.
Find more details in our paper: https://arxiv.org/abs/2407.11696
The computing used to train this model was provided by the NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center, accessed through the SBIR award 80NSSC23CA169. This research also used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility using NERSC award SBIR-ERCAP0030185. We thank NOAA, NASA, European Space Agency (ESA), and the Korean Meteorology Agency (KMA) for data access. Software developed for this study leveraged open source projects including PyTorch and Pangeo.