Model of the speaker identification and verification subsystem
DOI:
https://doi.org/10.18664/ikszt.v29i1.300905Ключові слова:
machine learning, speaker diarization, classification, pseudo-ensemble, sincnet, librispeach, librivoxАнотація
The paper is focused on the pressing problem of authentication and verification of speakers based on voice information, which plays an important role, for example, in online or remote communication and information exchange in all spheres of life, including scientific communication. The aim of this paper is to create a model of a speaker identification and verification subsystem. To achieve this goal, the following tasks were accomplished: the connection of the modules of the proposed model was explained, the voice information analysis module was explored, while ensuring the scalability of the system with a significant increase in the number of users, and the results were analyzed. The developed pseudo-ensemble-based neural network module was tested on a dataset prepared on the basis of the LibriSpeach corpus, an open English speech corpus based on the LirbiVox project of voluntarily provided audio books. The result of applying the developed module on the selected dataset is demonstrated, demonstrating that in order to implement the subsystem in a neural network training system, the proposed pseudo-ensemble should be trained on at least 120 epochs using noise reduction methods at the stage of audio sequence preprocessing.
Посилання
Холєв В., Барковська О. COMPARATIVE ANALYSIS OF NEURAL NETWORK MODELS FOR THE PROBLEM OF SPEAKER RECOGNITION //СУЧАСНИЙ СТАН НАУКОВИХ ДОСЛІДЖЕНЬ ТА ТЕХНОЛОГІЙ В ПРОМИСЛОВОСТІ. – 2023. – №. 2 (24). – С. 172-178.
Ravanelli, M., Bengio, Y. (2018), "Speaker Recognition from Raw Waveform with SincNet", 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, Р. 1021–1028. DOI: https://doi.org/10.1109/SLT.2018.8639585
Olesia, B., Iana, M., Nataliia, Y., Oleksii, L., & Danyil, T. (2019). System of individual multidimensional biometric authentication. International Journal of Emerging Trends in Engineering Research, 7(12), 812-817.
Illingworth, S.; Allen, G. (2020), "Introduction", Effective science communication: a practical guide to surviving as a scientist (2nd ed.), Bristol, UK; Philadelphia: IOP Publishing. Р. 1–5. DOI: https://doi.org/10.1088/978-0-7503-2520-2ch1
Côté, I., Darling, E. (2018), "Scientists on Twitter: Preaching to the choir or singing from the rooftops?", FACETS, 3. Р. 682–694. DOI: https://doi.org/10.1139/facets-2018-0002
Mane, A., Bhopale, J., Motghare, R., & Chimurkar, P. An Overview of Speaker Recognition and Implementation of Speaker Diarization with Transcription. International Journal of Computer Applications, 975, 8887.
Kahn, J., Lee, A., & Hannun, A. (2020, May). Self-training for end-to-end speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7084-7088). IEEE.
Fujita, Y., Kanda, N., Horiguchi, S., Xue, Y., Nagamatsu, K., & Watanabe, S. (2019, December). End-to-end neural speaker diarization with self-attention. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 296-303). IEEE.
Fujita, Y., Kanda, N., Horiguchi, S., Nagamatsu, K., & Watanabe, S. (2019). End-to-end neural speaker diarization with permutation-free objectives. arXiv preprinarXiv:1909.05952.
Horiguchi, S., Fujita, Y., Watanabe, S., Xue, Y., & Nagamatsu, K. (2020). End-to-end speaker diarization for an unknown number of speakers with encoder-decoder based attractors. arXiv preprint arXiv:2005.09921.
Park T. J. et al. A review of speaker diarization: Recent advances with deep learning //Computer Speech & Language. – 2022. – Т. 72. – С. 101317.
Dhanjal, A. S., & Singh, W. (2023). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 1-46.
Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., ... & Almojil, M. (2021). Automatic speech recognition: Systematic literature review. IEEE Access, 9, 131858-131876.
Kholiev, V., Barkovska, O. (2023), "Analysis of the of training and test data distribution for audio series classification", Information and control systems at railway transport, No. 1, P. 38-43. DOI: https://doi.org/10.18664/ikszt.v28i1.276343
V. Panayotov, G. Chen, D. Povey and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206-5210, doi: 10.1109/ICASSP.2015.7178964.
##submission.downloads##
Опубліковано
Номер
Розділ
Ліцензія
Ця робота ліцензується відповідно до Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.