Emergent Mind

Abstract

Semi-supervised video object segmentation (VOS) aims to segment a few moving objects in a video sequence, where these objects are specified by annotation of first frame. The optical flow has been considered in many existing semi-supervised VOS methods to improve the segmentation accuracy. However, the optical flow-based semi-supervised VOS methods cannot run in real time due to high complexity of optical flow estimation. A FAMINet, which consists of a feature extraction network (F), an appearance network (A), a motion network (M), and an integration network (I), is proposed in this study to address the abovementioned problem. The appearance network outputs an initial segmentation result based on static appearances of objects. The motion network estimates the optical flow via very few parameters, which are optimized rapidly by an online memorizing algorithm named relaxed steepest descent. The integration network refines the initial segmentation result using the optical flow. Extensive experiments demonstrate that the FAMINet outperforms other state-of-the-art semi-supervised VOS methods on the DAVIS and YouTube-VOS benchmarks, and it achieves a good trade-off between accuracy and efficiency. Our code is available at https://github.com/liuziyang123/FAMINet.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.