MFCC | DSP & Embedded Electronics

Heriyanto, Hartati and Putra, 2018

The recitation of the Qur’an has its own uniqueness, among others having a special rule in reading and pronunciation, which is called tajwid science. At the time of the Qur’an is recited, there are often mistakes due to the limitations of knowledge of Tajwid. Therefore, the availability of tools to facilitate in checking the appropriateness of recitation is very much needed by those who recite the Qur’an and face limitations in understanding the science of tajwid. Checking the Qur’an reading is a problem that must be solved according to the rules. So far, voice identification studies have problems with feature extraction, compatibility or suitability testing, and accuracy. The issue of feature extraction, suitability, and impermanence testing have been improved in this study, which consists of two stages. The first stage is the extraction of the sound character of the Qur’an reading and the second stage is the testing of the conformity of the Qur’anic recitation and accuracy. In the first stage feature extraction is handled using MFCC and Normalization of Dominant Weight (NDW). Characteristics of reading the Qur’an as reference table is taken from one reader of Al-Qur’an who has competence in the field of science tajwid, for sampling 5-7 people as a source for testing. The process of the second stage of conformity testing of Qur’an reading is done starting from filtering, sequential multiplication of reference table and Conformity Uniformity Pattern (CUP). The sample of reading conformity test is taken from 11 Qur’anic letters containing 8 reading laws and 886 records. The test is performed on the dominant frame, the number of cepstral coefficient and the number of frames. The reading conformance test provides an average accuracy of 91.37% on the nine dominant frames. The test for the number of cepstral coefficients in the c-23 can be an average of 96.65%, while the number of frames on the F-10 is the best average of 96.65%.

[https://doi.org/10.14738/aivp.62.4268]

This research was conducted to develop a method to identify voice utterance. For voice utterance that encounters change caused by aging factor, with the interval of 10 to 25 years. The change of voice utterance influenced by aging factor might be extracted by MFCC (Mel Frequency Cepstrum Coefficient). However, the level of the compatibility of the feature may be dropped down to 55%. While the ones which do not encounter it may reach 95%. To improve the compatibility of the changing voice feature influenced by aging factor, then the method of the more specific feature extraction is developed: which is by separating the voice into several channels, suggested as MFCC multichannel, consisting of multichannel 5 filterbank (M5FB), multichannel 2 filterbank (M2FB) and multichannel 1 filterbank (M1FB). The result of the test shows that for model M5FB and M2FB have the highest score in the level of compatibility with 85% and 82% with 25 years interval. While model M5FB gets the highest score of 86% for 10 years time interval.

[more information]