TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations (2301.01901v3)
Abstract: Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose an approach (TAC) to leverage high-dimensional SZ compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose several pre-process strategies and adaptively use them based on the data features. We further optimize TAC to TAC+ by improving the lossless encoding stage of SZ compression to handle many small AMR data blocks after the pre-processing efficiently. Experiments on 10 AMR datasets from three real-world large-scale AMR simulations demonstrate that TAC+ can improve the compression ratio by up to 4.9$\times$ under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics.
- C. Burstedde, O. Ghattas, G. Stadler, T. Tu, and L. C. Wilcox, “Towards adaptive mesh pde simulations on petascale computers,” Proceedings of Teragrid, vol. 8, 2008.
- A. Dubey, A. Almgren, J. Bell, M. Berzins, S. Brandt, G. Bryan, P. Colella, D. Graves, M. Lijewski, F. Löffler et al., “A survey of high level frameworks in block-structured adaptive mesh refinement packages,” Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3217–3227, 2014.
- W. Zhang, A. Almgren, V. Beckner, J. Bell, J. Blaschke, C. Chan, M. Day, B. Friesen, K. Gott, D. Graves et al., “Amrex: a framework for block-structured adaptive mesh refinement,” Journal of Open Source Software, vol. 4, no. 37, pp. 1370–1370, 2019.
- J. M. Stone, K. Tomida, C. J. White, and K. G. Felker, “The athena++ adaptive mesh refinement framework: Design and magnetohydrodynamic solvers,” The Astrophysical Journal Supplement Series, vol. 249, no. 1, p. 4, 2020.
- A. S. Almgren, J. B. Bell, M. J. Lijewski, Z. Lukić, and E. Van Andel, “Nyx: A massively parallel amr code for computational cosmology,” The Astrophysical Journal, vol. 765, no. 1, p. 39, 2013.
- B. Runnels, V. Agrawal, W. Zhang, and A. Almgren, “Massively parallel finite difference elasticity using block-structured adaptive mesh refinement with a geometric multigrid solver,” Journal of Computational Physics, vol. 427, p. 110065, 2021.
- S. Whitman, J. Brasseur, and P. Hamlington, “Simulation of bluff-body stabilized flames with pelec, an exascale combustion code,” 2018.
- K. Sverdrup, N. Nikiforakis, and A. Almgren, “Highly parallelisable simulations of time-dependent viscoplastic fluid flow with structured adaptive mesh refinement,” Physics of Fluids, vol. 30, no. 9, p. 093102, 2018.
- P. Deutsch, “Gzip file format specification version 4.3,” 1996.
- S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W.-k. Liao, and A. Choudhary, “Data compression for the exascale computing era-survey,” Supercomputing Frontiers and Innovations, vol. 1, no. 2, pp. 76–88, 2014.
- S. Di and F. Cappello, “Fast error-bounded lossy hpc data compression with sz,” in 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 2016, pp. 730–739.
- D. Tao, S. Di, Z. Chen, and F. Cappello, “Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization,” in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2017, pp. 1129–1139.
- X. Liang, S. Di, D. Tao, S. Li, S. Li, H. Guo, Z. Chen, and F. Cappello, “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in 2018 IEEE International Conference on Big Data. IEEE, 2018.
- P. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 2674–2683, 2014.
- M. Ainsworth, O. Tugluk, B. Whitney, and S. Klasky, “Mgard: A multilevel technique for compression of floating-point data,” in DRBSD-2 Workshop at Supercomputing, 2017.
- X. Liang, B. Whitney, J. Chen, L. Wan, Q. Liu, D. Tao, J. Kress, D. Pugmire, M. Wolf, N. Podhorszki et al., “Mgard+: Optimizing multilevel methods for error-bounded scientific data reduction,” IEEE Transactions on Computers, vol. 71, no. 7, pp. 1522–1536, 2021.
- R. Ballester-Ripoll, P. Lindstrom, and R. Pajarola, “Tthresh: Tensor compression for multidimensional visual data,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 9, pp. 2891–2903, 2020.
- F. Cappello, S. Di, S. Li, X. Liang, A. M. Gok, D. Tao, C. H. Yoon, X.-C. Wu, Y. Alexeev, and F. T. Chong, “Use cases of lossy compression for floating-point data in scientific data sets,” The International Journal of High Performance Computing Applications, 2019.
- S. Jin, P. Grosset, C. M. Biwer, J. Pulido, J. Tian, D. Tao, and J. Ahrens, “Understanding gpu-based lossy compression for extreme-scale cosmological simulations,” in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2020, pp. 105–115.
- P. Grosset, C. M. Biwer, J. Pulido, A. T. Mohan, A. Biswas, J. Patchett, T. L. Turton, D. H. Rogers, D. Livescu, and J. Ahrens, “Foresight: analysis that matters for data reduction,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020, pp. 1–15.
- T. Lu, Q. Liu, X. He, H. Luo, E. Suchyta, J. Choi, N. Podhorszki, S. Klasky, M. Wolf, T. Liu et al., “Understanding and modeling lossy compression schemes on hpc scientific data,” in 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2018, pp. 348–357.
- A. H. Baker, H. Xu, J. M. Dennis, M. N. Levy, D. Nychka, S. A. Mickelson, J. Edwards, M. Vertenstein, and A. Wegener, “A methodology for evaluating the impact of data compression on climate simulation data,” in Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. ACM, 2014, pp. 203–214.
- A. H. Baker, H. Xu, D. M. Hammerling, S. Li, and J. P. Clyne, “Toward a multi-method approach: Lossy data compression for climate simulation data,” in International Conference on High Performance Computing. Springer, 2017, pp. 30–42.
- A. M. Gok, S. Di, Y. Alexeev, D. Tao, V. Mironov, X. Liang, and F. Cappello, “Pastri: Error-bounded lossy compression for two-electron integrals in quantum chemistry,” in 2018 IEEE international conference on cluster computing (CLUSTER). IEEE, 2018, pp. 1–11.
- X.-C. Wu, S. Di, E. M. Dasgupta, F. Cappello, H. Finkel, Y. Alexeev, and F. T. Chong, “Full-state quantum circuit simulation by using data compression,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–24.
- H. Luo, J. Wang, Q. Liu, J. Chen, S. Klasky, and N. Podhorszki, “zmesh: Exploring application characteristics to improve lossy compression ratio for adaptive mesh refinement,” in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2021, pp. 402–411.
- L. Fedeli, A. Huebl, F. Boillod-Cerneux, T. Clark, K. Gott, C. Hillairet, S. Jaure, A. Leblanc, R. Lehe, A. Myers et al., “Pushing the frontier in the design of laser-based electron accelerators with groundbreaking mesh-refined particle-in-cell simulations on exascale-class supercomputers,” in SC22: international conference for high performance computing, networking, storage and analysis. IEEE, 2022, pp. 1–12.
- S. Jin, S. Di, F. Vivien, D. Wang, Y. Robert, D. Tao, and F. Cappello, “Concealing compression-accelerated i/o for hpc applications through in situ task scheduling,” in EuroSys 2024, 2024.
- G. K. Wallace, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
- D. Le Gall, “Mpeg: A video compression standard for multimedia applications,” Communications of the ACM, vol. 34, no. 4, pp. 46–58, 1991.
- K. Zhao, S. Di, M. Dmitriev, T.-L. D. Tonellot, Z. Chen, and F. Cappello, “Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation,” in 2021 IEEE 37th International Conference on Data Engineering. IEEE, 2021, pp. 1643–1654.
- X. Liang, Q. Gong, J. Chen, B. Whitney, L. Wan, Q. Liu, D. Pugmire, R. Archibald, N. Podhorszki, and S. Klasky, “Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1–13.
- F. Wang, N. Marshak, W. Usher, C. Burstedde, A. Knoll, T. Heister, and C. Johnson, “Cpu ray tracing of tree-based adaptive mesh refinement data,” Computer Graphics Forum, vol. 39, pp. 1–12, 06 2020.
- G. Harel, J.-B. Lekien, and P. Pébaÿ, “Two new contributions to the visualization of amr grids: I. interactive rendering of extreme-scale 2-dimensional grids ii. novel selection filters in arbitrary dimension,” 03 2017.
- D. Wang, J. Pulido, P. Grosset, J. Tian, J. Ahrens, and D. Tao, “Analyzing impact of data reduction techniques on visualization for amr applications using amrex framework,” in Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023, pp. 263–271.
- K. Zhao, S. Di, X. Liang, S. Li, D. Tao, Z. Chen, and F. Cappello, “Significantly improving lossy compression for hpc datasets with second-order prediction and parameter optimization,” in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020, pp. 89–100.
- D. Wang, J. Pulido, P. Grosset, J. Tian, S. Jin, H. Tang, J. Sexton, S. Di, K. Zhao, B. Fang et al., “Amric: A novel in situ lossy compression framework for efficient i/o in adaptive mesh refinement applications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023, pp. 1–15.
- J. Bentley, “Multidimensional binary search trees used for associative searching,” communications of the ACM September, 1975. vol. 18: pp. 509-517 : ill. includes bibliography., vol. 18, 01 1975.
- D. Hoang, H. Bhatia, P. Lindstrom, and V. Pascucci, “High-quality and low-memory-footprint progressive decoding of large-scale particle data,” in 2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV). IEEE Computer Society, 2021, pp. 32–42.
- G. Cirio, G. Lavoué, and F. Dupont, “A framework for data-driven progressive mesh compression.” in GRAPP, 2010, pp. 5–12.
- O. Devillers and P.-M. Gandoin, “Geometric compression for interactive transmission,” Proc. Visualization ’00, 01 2000.
- D. Wang, J. Pulido, P. Grosset, S. Jin, J. Tian, J. Ahrens, and D. Tao, “Tac: Optimizing error-bounded lossy compression for three-dimensional adaptive mesh refinement simulations,” in Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, 2022, pp. 135–147.
- X. Liang, S. Di, D. Tao, Z. Chen, and F. Cappello, “An efficient transformation scheme for lossy data compression with point-wise relative error bound,” in 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2018, pp. 179–189.
- S. Jin, J. Pulido, P. Grosset, J. Tian, D. Tao, and J. Ahrens, “Adaptive configuration of in situ lossy compression for cosmology simulations via fine-grained rate-quality modeling,” in Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021, pp. 45–56.
- M. Davis, G. Efstathiou, C. S. Frenk, and S. D. White, “The evolution of large-scale structure in a universe dominated by cold dark matter,” The Astrophysical Journal, vol. 292, pp. 371–394, 1985.
- B. Fang, D. Wang, S. Jin, Q. Koziol, Z. Zhang, Q. Guan, S. Byna, S. Krishnamoorthy, and D. Tao, “Characterizing impacts of storage faults on hpc applications: A methodology and insights,” in 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021, pp. 409–420.