1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop (2302.02651v1)

Published 6 Feb 2023 in cs.CV

Abstract: Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes. Existing PSG methods utilize one-stage paradigm which simultaneously generates scene graphs and predicts semantic segmentation masks or two-stage paradigm that first adopt an off-the-shelf panoptic segmentor, then pairwise relationship prediction between these predicted objects. One-stage approach despite having a simplified training paradigm, its segmentation results are usually under-satisfactory, while two-stage approach lacks global context and leads to low performance on relation prediction. To bridge this gap, in this paper, we propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings. To handle relation ambiguity and predicate classification bias caused by long-tailed distribution, we formulate relation prediction in the second stage as a multi-class classification task with soft label. We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard. We also show the effectiveness of our soft label strategy for long-tailed classes in ablation studies. Our code has been released in https://github.com/wangqixun/mfpsg.

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - wangqixun/mfpsg: mask2former psg (22 stars)

1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop (2302.02651v1)

Summary

Related Papers

GitHub