Are Your Epochs Too Epic? Batch Free Can Be Harmful (2401.11347v1)
Abstract: Epoch based memory reclamation (EBR) is one of the most popular techniques for reclaiming memory in lock-free and optimistic locking data structures, due to its ease of use and good performance in practice. However, EBR is known to be sensitive to thread delays, which can result in performance degradation. Moreover, the exact mechanism for this performance degradation is not well understood. This paper illustrates this performance degradation in a popular data structure benchmark, and does a deep dive to uncover its root cause-a subtle interaction between EBR and state of the art memory allocators. In essence, modern allocators attempt to reduce the overhead of freeing by maintaining bounded thread caches of objects for local reuse, actually freeing them (a very high latency operation) only when thread caches become too large. EBR immediately bypasses these mechanisms whenever a particularly large batch of objects is freed, substantially increasing overheads and latencies. Beyond EBR, many memory reclamation algorithms, and data structures, that reclaim objects in large batches suffer similar deleterious interactions with popular allocators. We propose a simple algorithmic fix for such algorithms to amortize the freeing of large object batches over time, and apply this technique to ten existing memory reclamation algorithms, observing performance improvements for nine out of ten, and over 50% improvement for six out of ten in experiments on a high performance lock-free ABtree. We also present an extremely simple token passing variant of EBR and show that, with our fix, it performs 1.5-2.6x faster than the fastest known memory reclamation algorithm, and 1.2-1.5x faster than not reclaiming at all, on a 192 thread four socket Intel system.
- 2021. Crystalline: Fast and Memory Efficient Wait-Free Reclamation, Ruslan Nikolaev and Binoy Ravindran (Eds.). CoRR abs/2108.02763. arXiv:2108.02763 https://arxiv.org/abs/2108.02763
- Stacktrack: An automated transactional approach to concurrent memory reclamation. In Proceedings of the Ninth European Conference on Computer Systems. 1–14.
- Forkscan: Conservative memory reclamation for modern operating systems. In Proceedings of the Twelfth European Conference on Computer Systems. 483–498.
- Threadscan: Automatic and scalable memory reclamation. ACM Transactions on Parallel Computing (TOPC) 4, 4 (2018), 1–18.
- Concurrent Deferred Reference Counting with Constant-Time Overhead. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 526–541. https://doi.org/10.1145/3453483.3454060
- Turning manual concurrent memory reclamation into automatic reference counting. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 61–75.
- A practical concurrent binary search tree. ACM Sigplan Notices 45, 5 (2010), 257–268.
- Trevor Brown. 2017. Techniques for Constructing Efficient Lock-free Data Structures. Ph. D. Dissertation. University of Toronto.
- Trevor Alexander Brown. 2015. Reclaiming memory for lock-free data structures: There has to be a better way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing. 261–270.
- Nachshon Cohen and Erez Petrank. 2015a. Automatic memory reclamation for lock-free data structures. ACM SIGPLAN Notices 50, 10 (2015), 260–279.
- Nachshon Cohen and Erez Petrank. 2015b. Efficient memory management for lock-free data structures with optimistic access. In Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures. 254–263.
- Orcgc: automatic lock-free memory reclamation. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 205–218.
- Lock-free reference counting. In Proceedings of the twentieth annual ACM symposium on Principles of distributed computing. 190–199.
- Fast non-intrusive memory reclamation for highly-concurrent data structures. In Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management. 36–45.
- Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada.
- Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.
- Sanjay Ghemawat and Paul Menage. 2005. TCMalloc: Thread-caching malloc. Retrieved from http://goog-perftools.sourceforge.net/doc/tcmalloc.html on January 27, 2023 (2005).
- Efficient and reliable lock-free memory reclamation based on reference counting. IEEE Transactions on Parallel and Distributed Systems 20, 8 (2008), 1173–1187.
- Timothy L Harris. 2001. A pragmatic implementation of non-blocking linked-lists. In International Symposium on Distributed Computing. Springer, 300–314.
- Performance of memory reclamation for lockless synchronization. J. Parallel and Distrib. Comput. 67, 12 (2007), 1270–1285.
- Nonblocking memory management support for dynamic-sized data structures. ACM Transactions on Computer Systems (TOCS) 23, 2 (2005), 146–196.
- Training distributed garbage: The DMOS collector. Object-Oriented Programming Systems, Language and Applications (1997).
- Applying Hazard Pointers to More Concurrent Data Structures. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2023, Orlando, FL, USA, June 17-19, 2023, Kunal Agrawal and Julian Shun (Eds.). ACM, 213–226. https://doi.org/10.1145/3558481.3591102
- Jeehoon Kang and Jaehwang Jung. 2020. A marriage of pointer-and epoch-based reclamation. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 314–328.
- K42: building a complete operating system. ACM SIGOPS Operating Systems Review 40, 4 (2006), 133–145.
- Mimalloc: Free list sharding in action. In Programming Languages and Systems: 17th Asian Symposium, APLAS 2019, Nusa Dua, Bali, Indonesia, December 1–4, 2019, Proceedings 17. Springer, 244–265.
- Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems, Vol. 509518. Citeseer, 509–518.
- Maged M Michael. 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Transactions on Parallel and Distributed Systems 15, 6 (2004), 491–504.
- Looking into the Peak memory consumption of epoch-based reclamation in scalable in-memory database systems. In Database and Expert Systems Applications: 30th International Conference, DEXA 2019, Linz, Austria, August 26–29, 2019, Proceedings, Part II 30. Springer, 3–18.
- Pedro Moreno and Ricardo Rocha. 2023. Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free Programs. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures. 177–186.
- Ruslan Nikolaev and Binoy Ravindran. 2019. Hyaline: fast and transparent lock-free memory reclamation. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing. 419–421.
- Ruslan Nikolaev and Binoy Ravindran. 2020. Universal wait-free memory reclamation. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 130–143.
- Pedro Ramalhete and Andreia Correia. [n. d.]. Brief announcement: Hazard eras-non-blocking memory reclamation. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures. 367–369.
- Vbr: Version based reclamation. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. 443–445.
- Nbr: neutralization based reclamation. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 175–190.
- Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data Structures. In 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 112–122. https://doi.org/10.1109/IPDPS54959.2023.00021
- Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization. IEEE Transactions on Parallel and Distributed Systems 35, 2 (2024), 203–220. https://doi.org/10.1109/TPDS.2023.3335671
- Adrian Tam. 2006. QDo: A Quiescent State Callback Facility. Ph. D. Dissertation. University of Toronto.
- Interval-based memory reclamation. ACM SIGPLAN Notices 53, 1 (2018), 1–13.
- ycombinator. 2017. Why is memory reclamation so important for lock-free algorithms? Retrieved from https://web.archive.org/web/20200223075152/https://news.ycombinator.com/item?id=15269628 on January 27, 2023.
- Daewoo Kim (6 papers)
- Trevor Brown (25 papers)
- Ajay Singh (17 papers)