Adapting WavLM for Speech Emotion Recognition

Click here to flash read.

arXiv:2405.04485v1 Announce Type: new
Abstract: Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.

Click here to read this post out

ID: 842098; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: May 8, 2024, 7:31 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: