Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 34 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Self-Supervised Learning for Audio-Based Emotion Recognition (2307.12343v1)

Published 23 Jul 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data. Our model is first pretrained to uncover the randomly-masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via several evaluation metrics against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics. This work shows the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small, and that the effect is most pronounced for emotions which are easier to classify such as happy, sad and anger. This work further demonstrates that self-supervised learning works when applied to embedded feature representations rather than the traditional approach of pre-training on the raw input space.

Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.