Abstract: Speech base demotion recognition systems please crucial part in audio conferencing. Themachine learning speech based emotion recognition techniques has lessrobustness due to the sensitivity to noise reverberation accent change andlanguage change. Traditional popular Mel Frequency Cepstral Coefficient (MFCC)algorithm for feature extraction performs poor in background non stationarynoise. To deal with this problem this paper presents emotion speech emotionrecognition based on Deep learning approach. In this multi layers convolutionalneural network along with simple K nearest neighbor classifier is applied forthe classification of areas emotions such as happy, sad, neutral, disgust andsurprise. Extensive experimentation on the real-time database collected fromopen source social media platform YouTube has shown that combination of MFCC-CNNalong with KNN classifier performs better than existing MFCC algorithm.
Keywords: Speech Emotion Recognition, Deep Learning,Convolutional Neural Network, Mel Frequency Cepstrum Coefficients (MFCC)