Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 47 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 64 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 452 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

(Poly)Logarithmic Time Construction of Round-optimal $n$-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI (2205.10072v2)

Published 20 May 2022 in cs.DC

Abstract: We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log3 p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log2 p)$ time and $O(p\log p)$ space construction of Tr\"aff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms for the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improves over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Broadcasting multiple messages in the multiport model. IEEE Transactions on Parallel and Distributed Systems, 10(5):500–508, 1999.
  2. An optimal algorithm for computing census functions in message-passing systems. Parallel Processing Letters, 3(1):19–23, 1993.
  3. Optimal multiple message broadcasting in telephone-like communication systems. Discrete Applied Mathematics, 100(1–2):1–15, 2000.
  4. Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 8(11):1143–1156, 1997.
  5. Arthur M. Farley. Broadcast time in communication networks. SIAM Journal on Applied Mathematics, 39(2):385–390, 1980.
  6. Broadcasting multiple messages in the 1-in port model in optimal time. Journal of Combinatorial Optimization, 36(4):1333–1355, 2018.
  7. A new construction of broadcast graphs. Discrete Applied Mathematics, 280:144–155, 2020.
  8. An efficient heuristic for broadcasting in networks. Journal of Parallel and Distributed Computing, 66(1):68–76, 2006.
  9. Reproducible MPI benchmarking is still not as easy as you think. IEEE Transactions on Parallel and Distributed Systems, 27(12):3617–3630, 2016.
  10. Bin Jia. Process cooperation in multiple message broadcast. Parallel Computing, 35(12):572–580, 2009.
  11. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249–1268, 1989.
  12. Optimal broadcast and summation in the LogP model. In 5th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 142–153, 1993.
  13. Multiple message broadcasting in communication networks. Networks, 26:253–261, 1995.
  14. MPI Forum. MPI: A Message-Passing Interface Standard. Version 3.1, June 4th 2015. www.mpi-forum.org.
  15. Collective operations in NEC’s high-performance MPI libraries. In 20th International Parallel and Distributed Processing Symposium (IPDPS), page 100, 2006.
  16. Eunice E. Santos. Optimal and near-optimal algorithms for k𝑘kitalic_k-item broadcast. Journal of Parallel and Distributed Computing, 57(2):121–139, 1999.
  17. Jesper Larsson Träff. Brief announcement: Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 143–146. ACM, 2022.
  18. Jesper Larsson Träff. Fast(er) construction of round-optimal n𝑛nitalic_n-block broadcast schedules. In IEEE International Conference on Cluster Computing (CLUSTER), pages 142–151. IEEE Computer Society, 2022.
  19. Decomposing MPI collectives for exploiting multi-lane communication. In IEEE International Conference on Cluster Computing (CLUSTER), pages 270–280. IEEE Computer Society, 2020.
  20. Optimal broadcast for fully connected processor-node networks. Journal of Parallel and Distributed Computing, 68(7):887–901, 2008.
  21. A pipelined algorithm for large, irregular all-gather problems. International Journal of High Performance Computing Applications, 24(1):58–68, 2010.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)