Large-Scale Study of Perceptual Video Quality (1803.01761v2)

Published 5 Mar 2018 in eess.IV

Abstract: The great variations of videographic skills, camera designs, compression and processing protocols, and displays lead to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, often commingled distortions that are impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Towards advancing NR video quality prediction, we constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding more than 205000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the LIVE Video Quality Challenge Database (LIVE-VQC), by conducting a comparison of leading NR video quality predictors on it. This study is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: http://live.ece.utexas.edu/research/LIVEVQC/index.html .

Citations (219)

View on Semantic Scholar

Summary

The paper introduces the LIVE-VQC database with 585 uniquely impaired videos captured from 43 different device models.
The paper employs a crowdsourcing methodology on Amazon Mechanical Turk to gather over 205,000 subjective quality scores from 4776 participants.
The paper evaluates state-of-the-art no-reference VQA models, with V-BLIINDS outperforming others, highlighting the need for more robust assessments.

Overview of the Large-Scale Study of Perceptual Video Quality

The paper presents an exhaustive exploration within the domain of Video Quality Assessment (VQA), particularly focusing on no-reference (NR) video quality prediction amidst an authentic, real-world setting. This paper acknowledges the inherent complexity of video impairments that result from variations in videographic approaches, devices, and environmental conditions, underscoring the inadequacies of existing NR models due to their reliance on inadequately diverse datasets. To bridge this gap, the research introduces a significant resource, the LIVE Video Quality Challenge Database (LIVE-VQC), comprising 585 uniquely impaired videos obtained from a broad demographic using 43 different device models.

Methodological Contributions

The authors employ a crowdsourcing approach to collect over 205,000 subjective video quality scores from 4776 participants via Amazon Mechanical Turk (AMT), which serves as a substantial empirical foundation to evaluate NR VQA models. Several complexities associated with this methodology are addressed, including participant diversity, bandwidth variability, and device discrepancies. The process highlights the importance of strategic experimental design in handling the stochastic nature of online evaluations.

Database Construction and Characteristics

The constructed LIVE-VQC database extends the scope of existing VQA databases by incorporating videos laced with authentic distortions often found in practice rather than synthesized within controlled environments. These videos reflect the variability in real-world conditions, enhanced by contributions from amateur videographers using numerous device types. The database offers an unprecedented level of realism, incorporating a wide resolution spectrum and diverse temporal-spatial artifacts that naturally arise from varying capture conditions.

Empirical Evaluation of NR Models

Performance analyses of several blind VQA models—NIQE, BRISQUE, V-BLIINDS, and VIIDEO—show varied efficacy when benchmarked against the LIVE-VQC database. V-BLIINDS emerges as the superior model among those tested, demonstrating improved correlation with human quality assessments. Despite this, the overall performance across models indicates significant room for advancements in accurately modeling real-world video quality degradations.

Implications and Future Trajectories

The findings presented in this paper hold extensive theoretical and practical implications. They underscore the necessity of developing more robust NR models capable of handling the intricacies of authentic distortions. The LIVE-VQC database thus serves not only as a benchmark for current VQA models but also as a solid foundation for the development of future algorithms. The paper envisages further exploration into the effects of sequential impairments and techniques to mitigate bandwidth-related quality degradation during crowdsourced assessments.

In conclusion, the research contributes a valuable resource to the VQA domain, prompting a reevaluation of existing assumptions about video quality assessment and laying the groundwork for future innovations designed to handle the complexities of real-world video content in a manner that closely emulates human perceptual judgments.

PDF Markdown