UnLoc: A Unified Framework for Video Localization Tasks

Published 21 Aug 2023 in cs.CV and cs.LG | (2308.11062v1)

Abstract: While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task. We design a new approach for this called UnLoc, which uses pretrained image and text towers, and feeds tokens to a video-text fusion model. The output of the fusion module are then used to construct a feature pyramid in which each level connects to a head to predict a per-frame relevancy score and start/end time displacements. Unlike previous works, our architecture enables Moment Retrieval, Temporal Localization, and Action Segmentation with a single stage model, without the need for action proposals, motion based pretrained features or representation masking. Unlike specialized models, we achieve state of the art results on all three different localization tasks with a unified approach. Code will be available at: \url{https://github.com/google-research/scenic}.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (38)

View on Semantic Scholar

Summary

The paper presents a unified framework that consolidates various video localization tasks into a single model.
It leverages novel temporal-spatial feature extraction techniques to enhance localization accuracy.
Extensive evaluations on benchmark datasets demonstrate the method’s effectiveness and efficiency.

Overview of ICCV LaTeX Author Guidelines

The paper "LaTeX Author Guidelines for ICCV Proceedings" provides a comprehensive set of instructions for authors intending to submit manuscripts to the International Conference on Computer Vision (ICCV). It focuses on formatting requirements, submission protocols, and other key guidelines pertinent to ensuring manuscript compliance with ICCV standards.

Key Sections

Abstract and Introduction: The abstract should be italicized and occupy the top of the left-hand column, following strict formatting details such as font size and justification. The introduction outlines several essential directives concerning language, dual submissions, and paper length.
Submission Details: Notably, the manuscript must be in English and adhere to a strict eight-page limit, excluding references. Overlength submissions will result in non-review. Additionally, the paper must comply with styling norms that facilitate impartial peer review, such as using a printed ruler for reviewer feedback and maintaining anonymity.
Formatting Specifications: Detailed instructions are provided for maintaining consistent margins, page numbering, and column widths. Authors are required to ensure complete adherence to these guidelines to prevent discrepancies during the review process. Specific typographic instructions are emphasized, such as font style and heading hierarchy.
Blind Review Protocols: The guidelines clarify common misconceptions about anonymizing submissions. Authors must omit identifiers and maintain objectivity when referencing their work. Special attention is given to technical reports and overlapping submissions, ensuring compliance with ICCV's double-blind review process.
Mathematical Content and Citations: Authors are instructed to number all sections and equations for reference purposes. The paper underscores the importance of a consistent citation format, particularly the use of \etal for multiple authors.
Figures and Tables: Instructions for including visual elements stress the importance of clarity and legibility. Authors must use appropriate font sizes and ensure graphics are perceivable in printed form.
Final Submission Requirements: Authors must submit a signed IEEE copyright release form with their final paper. Compliance with this prerequisite is mandatory for publication inclusion.

Implications and Future Directions

The guidelines underscore the necessity for meticulous document preparation, mitigating potential obstacles in the peer-review process. Adherence to these standards not only facilitates consistency but also supports the integrity and accessibility of conference materials.

As AI continues to evolve, conferences like ICCV play a crucial role in disseminating cutting-edge research. Ensuring a standardized submission process enhances the dissemination of knowledge and fosters robust academic discourse. Future developments might include automated tools for compliance verification, streamlining the preparation process and reducing administrative burdens on authors.

In summary, the "LaTeX Author Guidelines for ICCV Proceedings" are not merely procedural; they are foundational to the rigorous academic standards that underpin one of computer vision's most prominent conferences.

Markdown Report Issue