Categories
DSP

Automatic Segmentation of Indonesian Speech into Syllables using Fuzzy Smoothed Energy Contour with Local Normalization, Splitting, and Assimilation

This paper discusses the usage of short term energy contour of a speech smoothed by a fuzzy-based method to automatically segment the speech into syllabic units. Two additional procedures, local normalization and postprocessing, are proposed to improve the method. Testing to Indonesian speech dataset shows that local normalization significantly improves the accuracy of fuzzy smoothing. In postprocessing step, the procedure of splitting missed short syllables reduces the deletion errors, but unfortunately it increases the insertion ones. On the other hand, an assimilation of a single consonant segment into its previous or next segment reduces the insertion errors, but increases the deletion ones. The sequential combination of splitting and then assimilation gives quite significant improvement of accuracy as well as reduction of deletion errors, but it slightly increases the insertion ones.

(Journal of ICT Research and Applications, ITB, 2014)