Heliophysics is central to understanding and forecasting space weather events and solar activity. Despite decades of high-resolution observations from the Solar Dynamics Observatory (SDO), most models remain task-specific and constrained by scarce labeled data, limiting their capacity to generalize across solar phenomena. We introduce Surya, a 366M parameter foundation model for heliophysics designed to learn general-purpose solar representations from multi-instrument SDO observations, including eight Atmospheric Imaging Assembly (AIA) channels and five Helioseismic and Magnetic Imager (HMI) products. Surya employs a spatiotemporal transformer architecture with spectral gating and long--short range attention, pretrained on high-resolution solar image forecasting tasks and further optimized through autoregressive rollout tuning. Zero-shot evaluations demonstrate its ability to forecast solar dynamics and flare events, while downstream fine-tuning with parameter-efficient Low-Rank Adaptation (LoRA) shows strong performance on solar wind forecasting, active region segmentation, solar flare forecasting, and EUV spectra. Surya is the first foundation model in heliophysics that uses time advancement as a pretext task on full-resolution SDO data. Its novel architecture and performance suggest that the model is able to learn the underlying physics behind solar evolution.
Notes: 6 FLOP / parameter / token * 366 * 10^6 parameters * 1342177300000 tokens [see dataset size notes] = 2.9474214e+21 FLOP
Size Notes: "In phase one, we trained Surya for 160,000 gradient descent steps on 128 NVIDIA A100 GPUs. The model is trained with batch size 1 (per GPU), making an effective batch size of 128." " Given a patch size of 16 × 16, we end up with N = 65, 536 tokens" 65536 * 128 * 160000 = 1.3421773e+12 tokens
Notes: 366M