Click here to flash read.
arXiv:2405.04485v1 Announce Type: new
Abstract: Recently, the usage of speech self-supervised models (SSL) for downstream tasks has been drawing a lot of attention. While large pre-trained models commonly outperform smaller models trained from scratch, questions regarding the optimal fine-tuning strategies remain prevalent. In this paper, we explore the fine-tuning strategies of the WavLM Large model for the speech emotion recognition task on the MSP Podcast Corpus. More specifically, we perform a series of experiments focusing on using gender and semantic information from utterances. We then sum up our findings and describe the final model we used for submission to Speech Emotion Recognition Challenge 2024.
Click here to read this post out
ID: 842098; Unique Viewers: 0
Unique Voters: 0
Total Votes: 0
Votes:
Latest Change: May 8, 2024, 7:31 a.m.
Changes:
Dictionaries:
Words:
Spaces:
Views: 5
CC:
No creative common's license
No creative common's license
Comments: