Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization (2401.14727v1)

Published 26 Jan 2024 in cs.SE

Abstract: Code summarization aims to generate natural language descriptions of source code, facilitating programmers to understand and maintain it rapidly. While previous code summarization efforts have predominantly focused on method-level, this paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. Unlike method-level code summarization,file-level code summarization typically involves long source code within a single file, which makes it challenging for Transformer-based models to understand the code semantics for the maximum input length of these models is difficult to set to a large number that can handle long code input well, due to the quadratic scaling of computational complexity with the input sequence length. To address this challenge, we propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences. Specifically, the SparseCoder employs a sliding window mechanism for self-attention to model short-term dependencies and leverages the structure message of code to capture long-term dependencies among source code identifiers by introducing two types of sparse attention patterns named global and identifier attention. To evaluate the performance of SparseCoder, we construct a new dataset FILE-CS for file-level code summarization in Python. Experimental results show that our SparseCoder model achieves state-of-the-art performance compared with other pre-trained models, including full self-attention and sparse models. Additionally, our model has low memory overhead and achieves comparable performance with models using full self-attention mechanism.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: