Download PDFOpen PDF in browserCurrent versionComparison of Different Neural Network Architectures for Spoken Language IdentificationEasyChair Preprint 10680, version 25 pages•Date: August 15, 2023AbstractThis paper compares different neural network based archi- tectures on the spoken language identification task. To our best knowledge such a comparison of different models on the same dataset and the same set of languages does not yet exist. We incorporate 7 different models which include the latest architectures: a spectral images based Resnet model, a Convolutional Neural Network, a Bi-directional Long Short-Term Memory, a Convolutional Recurrent Neural Net- work, Wav2Vec 2.0, a transformer and a conformer. We also tackle audio with background noise and music by train- ing on data with similar accoustics. We finally also show that our models generalize well on third-party data. Keyphrases: Conformer, Language Identification, Wav2vec 2.0, neural networks, transformer
|