Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers (2302.08055v2)

Published 16 Feb 2023 in cs.AR

Abstract: Memory resources in data centers generally suffer from low utilization and lack of dynamics. Memory disaggregation solves these problems by decoupling CPU and memory, which currently includes approaches based on RDMA or interconnection protocols such as Compute Express Link (CXL). However, the RDMA-based approach involves code refactoring and higher latency. The CXL-based approach supports native memory semantics and overcomes the shortcomings of RDMA, but is limited within rack level. In addition, memory pooling and sharing based on CXL products are currently in the process of early exploration and still take time to be available in the future. In this paper, we propose the CXL over Ethernet approach that the host processor can access the remote memory with memory semantics through Ethernet. Our approach can support native memory load/store access and extends the physical range to cross server and rack levels by taking advantage of CXL and RDMA technologies. We prototype our approach with one server and two FPGA boards with 100 Gbps network and measure the memory access latency. Furthermore, we optimize the memory access path by using data cache and congestion control algorithm in the critical path to further lower access latency. The evaluation results show that the average latency for the server to access remote memory is 1.97 {\mu}s, which is about 37% lower than the baseline latency in the industry. The latency can be further reduced to 415 ns with cache block and hit access on FPGA.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.