Browsing by Author "Hadj Moussa, KELLOU"

Now showing 1 - 1 of 1

voice activity detection based on machine /deep learning
(université Ghardaia, 2022) BESSEKHOUAD, Moussa; Hadj Moussa, KELLOU
Voice activity detection (VAD) is identifying speech sections and nonspeech sections in audio files, it's considered a key in many speech applications. Our VAD system is based on deep learning approach also is trained to interact with audio files that are in the Arabic language. As we know the real world interferes with many noise and sound, VAD must deal with a height level of noise, and that’s the reason why this document builds on two different models the first model receives noisy speech audio try to delete and reduce the noise, this model have Redundant Convolutional Encoder-Decoder structure (R-CED) trained by receiving the spectra of the noisy speech file and generate the spectra of the enhanced noisy speech file and the second model received the enhanced noisy speech file and classify the audio into speech section and non-speech section, this second model has artificial Neural Networks structure (ANN), receive the audio information directly, trained by common voice corpus Arabic language and Qut-noise datasets. Getting at the end a 90% accuracy at 5db SNR noise...الكشف عن النشاط الصوتي ( )VADهو تحديد المقاطع التي تحتوي على كلام والمقاطع الغير كلامية في الملفات الصوتية ، و يعتبر مفتا ً حا في العديد من تطبيقات الكلام. تم انشاء نظام VADالخاص بنا باستخدام نهج التعلم العميق كذلك تم تدريبه على التفاعل مع الملفات الصوتية التي تحتوي على اللغة العربية. وكما نعلم أن العالم الحقيقي يتداخل فيه العديد من الضوضاء والاصوت ،لذلك يجب أن يتعامل VADمع ضوضاء مرتفعة، وهذا هو السبب في أن هذه المذكرة تعتمد على نموذجين مختلفين النموذج الأول يستقبل ية بن صوتًا صاخبًا في محاولة لحذف وتقليل الضوضاء ، يحتوي هذا النموذج على فك التشفير التلافيفية المكررة R-CEDمدربة من خلال تلقي أطياف ملف الكلام الصاخب وتوليد أطياف ملف الكلام الصاخب طع المح ّ سن, وتلقى النموذج الثاني ملف الكلام الصاخب المح ّ سن ويصنف الصوت إلى مقاطع الكلامية ومقا غير الكلامية ، هذا النموذج الثاني مبني على بنية الشبكة العصبية الاصطناعية ، ANNيتلقى المعلومات الصوتية مباشرة ، مدرب من قبل مجموعة صوتية مشتركة باللغة العربية ومجموعات بيانات .Qut-Noise تم الوصول في النهاية إلى دقة تصل إلى ٪90في ضوضاء .SNR 5d