Enhancing Emotion Classification in Malayalam Accented Speech: An In-Depth Clustering Approach
DOI:
https://doi.org/10.53555/jaz.v44i5.2894Keywords:
Speech Emotion Recognition (SER), Accented Speech Recognition, Clustering Techniques, Speech Feature EngineeringAbstract
Accurate emotion classification in accented speech for the Malayalam language poses a unique challenge in the realm of speech recognition. In this study, we explore the application of various clustering algorithms to this specific dataset, evaluating their effectiveness using the Silhouette Score as a measure of cluster quality. Our findings reveal significant insights into the performance of these algorithms. Among the clustering methods, Affinity Propagation emerged as the frontrunner, achieving the highest Silhouette Score of 0.5255. This result indicates a superior cluster quality characterized by well-defined and distinct groups. OPTICS and Mean Shift Clustering also demonstrated strong performance with scores of 0.4029 and 0.2511, respectively, indicating the presence of relatively distinct and well-formed clusters. In addition, we introduced Ensemble Clustering (Majority Voting), which achieved a score of 0.2399, indicating moderate cluster distinction. These findings provide a valuable perspective on the potential advantages of ensemble methods in this context. Our experiment results shed light on the effectiveness of various clustering methods in the context of emotion classification in accented Malayalam speech. This study contributes to the advancement of speech recognition technology and lays the groundwork for further research in this area..
Downloads
References
S. Kanwal and S. Asghar, "Speech Emotion Recognition Using Clustering Based GA-Optimized Fea-ture Set," in IEEE Access, vol. 9, pp. 125830-125842, 2021, doi: 10.1109/ACCESS.2021.3111659.
Hajarolasvadi N, Demirel H. 3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms. Entropy. 2019; 21(5):479. https://doi.org/10.3390/e21050479.
A. Dutt and P. Gader, “WaVELEt Multiresolution analysis based Speech Emotion Recognition System using 1D CNN LSTM networks,” IEEE/ACM Transactions on Audio, Speech, and Language Pro-cessing, vol. 31, pp. 2043–2054, Jan. 2023, doi: 10.1109/taslp.2023.3277291.
S. N. Zisad, M. M. Hossain, and K. Andersson, “Speech emotion recognition in neurological disorders using convolutional neural network,” in Lecture Notes in Computer Science, Springer Sci-ence+Business Media, 2020, pp. 287–296. doi: 10.1007/978-3-030-59277-6_26.
A. A. Abdelhamid et al., “Robust Speech Emotion Recognition using CNN+LSTM based on stochastic Fractal Search Optimization Algorithm,” IEEE Access, vol. 10, pp. 49265–49284, Jan. 2022, doi: 10.1109/access.2022.3172954.
K. S. Lee and H. J. Kim, Design of a convolutional neural network for speech emotion recognition. 2020. doi: 10.1109/ictc49870.2020.9289227.
H. A. Abdulmohsin, H. B. A. Wahab, and A. M. J. A. Hossen, “A new proposed statistical feature extraction method in speech emotion recognition,” Computers & Electrical Engineering, vol. 93, p. 107172, Jul. 2021, doi: 10.1016/j.compeleceng.2021.107172
S. Sunny, D. P. S, and K. P. Jacob, “Discrete wavelet transforms and artificial neural networks for recognition of isolated spoken words,” International Journal of Computer Applications, vol. 38, no. 9, pp. 9–13, Jan. 2012, doi: 10.5120/4634-6871.
I. Shahin, O. A. Alomari, A. B. Nassif, I. Afyouni, I. A. Hashem, and A. Elnagar, “An efficient fea-ture selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer,” Applied Acoustics, vol. 205, p. 109279, Mar. 2023, doi: 10.1016/j.apacoust.2023.109279.
A. S. D. Alluhaidan, O. Saidani, R. Jahangir, M. A. Nauman, and O. S. Neffati, “Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network,” Applied Sciences, vol. 13, no. 8, p. 4750, Apr. 2023, doi: 10.3390/app13084750.
A. Asghar, S. Sohaib, S. Iftikhar, M. Shafi, and K. Fatima, “An Urdu speech corpus for emotion recognition,” PeerJ, vol. 8, p. e954, May 2022, doi: 10.7717/peerj-cs.954.
A. B. A. Qayyum, A. Arefeen, and C. Shahnaz, Convolutional Neural Network (CNN) Based Speech-Emotion Recognition. 2019. doi: 10.1109/spicscon48833.2019.9065172
L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub, and C. Cléder, “Automatic Speech Emotion Recognition using Machine learning,” in IntechOpen eBooks, 2020. doi: 10.5772/intechopen.84856.
S. Langari, H. Marvi, and M. Zahedi, “Efficient speech emotion recognition using modified feature extraction,” Informatics in Medicine Unlocked, vol. 20, p. 100424, Jan. 2020, doi: 10.1016/j.imu.2020.100424.
H. Huang, Z. Hu, W. Wang, and M. Wu, “Multimodal Emotion Recognition Based on Ensemble Con-volutional Neural Network,” IEEE Access, vol. 8, pp. 3265–3271, Jan. 2020, doi: 10.1109/access.2019.2962085.
H. Aouani and Y. B. Ayed, “Speech Emotion Recognition with deep learning,” Procedia Computer Science, vol. 176, pp. 251–260, Jan. 2020, doi: 10.1016/j.procs.2020.08.027.
J. Wang, M. Xue, R. Culhane, E. Diao, J. Ding, and V. Tarokh, Speech Emotion Recognition with Dual-Sequence LSTM Architecture. 2020. doi: 10.1109/icassp40776.2020.9054629.
B. T. Atmaja and M. Akagi, Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model. 2019. doi: 10.1109/icsigsys.2019.8811080.
J. Zhao, X. Mao, and L. Chen, “Speech emotion recognition using deep 1D & 2D CNN LSTM net-works,” Biomedical Signal Processing and Control, vol. 47, pp. 312–323, Jan. 2019, doi:
1016/j.bspc.2018.08.035.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Rizwana Kallooravi Thandil, Mohamed Basheer K.P

This work is licensed under a Creative Commons Attribution 4.0 International License.
