Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 157 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 397 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Octave-YOLO: Cross frequency detection network with octave convolution (2407.19746v1)

Published 29 Jul 2024 in cs.CV

Abstract: Despite the rapid advancement of object detection algorithms, processing high-resolution images on embedded devices remains a significant challenge. Theoretically, the fully convolutional network architecture used in current real-time object detectors can handle all input resolutions. However, the substantial computational demands required to process high-resolution images render them impractical for real-time applications. To address this issue, real-time object detection models typically downsample the input image for inference, leading to a loss of detail and decreased accuracy. In response, we developed Octave-YOLO, designed to process high-resolution images in real-time within the constraints of embedded systems. We achieved this through the introduction of the cross frequency partial network (CFPNet), which divides the input feature map into low-resolution, low-frequency, and high-resolution, high-frequency sections. This configuration enables complex operations such as convolution bottlenecks and self-attention to be conducted exclusively on low-resolution feature maps while simultaneously preserving the details in high-resolution maps. Notably, this approach not only dramatically reduces the computational demands of convolution tasks but also allows for the integration of attention modules, which are typically challenging to implement in real-time applications, with minimal additional cost. Additionally, we have incorporated depthwise separable convolution into the core building blocks and downsampling layers to further decrease latency. Experimental results have shown that Octave-YOLO matches the performance of YOLOv8 while significantly reducing computational demands. For example, in 1080x1080 resolution, Octave-YOLO-N is 1.56 times faster than YOLOv8, achieving nearly the same accuracy on the COCO dataset with approximately 40 percent fewer parameters and FLOPs.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: