Emergent Mind

Character-focused Video Thumbnail Retrieval

(2204.06563)
Published Apr 13, 2022 in cs.CV and cs.LG

Abstract

We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.