FuXi: A cascade machine learning forecasting system for 15-day global weather forecast (2306.12873v3)
Abstract: Over the past few years, due to the rapid development of ML models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement.
- Magnusson, L., et al.: ECMWF activities for improved hurricane forecasts. Bull. Am. Meteorol. Soc. 100(3), 445–458 (2019) (3) Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
- Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.