Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 52 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

PVG: Progressive Vision Graph for Vision Recognition (2308.00574v2)

Published 1 Aug 2023 in cs.CV

Abstract: Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and mathematical Expectation (MaxE) to aggregate rich neighbor information; 3) Graph error Linear Unit (GraphLU) to enhance low-value information in a relaxed form to reduce the compression of image detail information for alleviating the over-smoothing. Extensive experiments on mainstream benchmarks demonstrate the superiority of PVG over state-of-the-art methods, e.g., our PVG-S obtains 83.0% Top-1 accuracy on ImageNet-1K that surpasses GNN-based ViG-S by +0.9 with the parameters reduced by 18.5%, while the largest PVG-B obtains 84.2% that has +0.5 improvement than ViG-B. Furthermore, our PVG-S obtains +1.3 box AP and +0.4 mask AP gains than ViG-S on COCO dataset.

Citations (7)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube