Header

Search

Telling compounds and phrases apart in Vietnamese. A random forest classification

Sandrien van Ommen* (University of Geneva), Catalina Torres (UZH ISLE), Lâm Quang Đông (University of Languages and International Studies Vietnam National University), Anne-Lise Giraud (University of Geneva; Université Paris Cité) and Balthasar Bickel (UZH ISLE) published a paper on the phonetic distinguishability of compounds vs. phrases in Vietnamese. 

Abstract:
Vietnamese is an isolating language with rich productive compounding, but no morphosyntactic, phonotactic or phonological evidence to assume a linguistic level between the syllable and the phrase (Schiering et al. 2010). We model an artificial listener with a Random Forest Classifier, to study the phonetic distinguishability of compounds vs. phrases, following Nguyen and Ingram (2007). This Machine Learning algorithm represents the maximal potential for a system to differentiate the two classes based on phonetics alone. It ranks the importance of each phonetic correlate to the differentiation of these classes. This allows an interpretation beyond whether a difference on a particular phonetic dimension exists including how important this difference is. The results confirm that the two classes can only be phonetically separated under circumstances of maximal contrast, and that maximal contrast is realized through juncture marking. Furthermore, we show that the two classes cannot be perfectly separated even under conditions of maximal contrast and additionally that there is an across-the-board preference for a compound interpretation from the phonetic data, even when the Random Forest Classifier was trained on maximal contrast data.

Open Access link

DOI: 10.6519/TJL.202509_23(3).0002

 

Subpages