For example there is vowel and consonant phonemes in Chinese
vowels = ['a', 'ai', 'an', 'ang', 'ao', 'e', 'ei', 'en', 'eng', 'er', 'i', 'ia', 'ian', 'iang', 'iao', 'ie', 'ii', 'iii', 'in', 'ing', 'iong', 'iou', 'o', 'ong', 'ou', 'u', 'ua', 'uai', 'uan', 'uang', 'uei', 'uen', 'ueng', 'uo', 'v', 'van', 've', 'vn', 'zh']
consonants = ['b','c','ch', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 'sh',' sp', 'sil', 't', 'x', 'z']
Suppose I have tri-phone like this:
The tri-phone 'a-b+c' means previous,current,following phoneme is a,b and c.
I want to use regex to extract the adjacent vowels pattern like vowel-vowel+*
and *-vowel+vowel
.
For example
Match: zh-uei+x, b-ai+vn, e-uang+x
Don't match: sil-z+ai, vn-l+v, x-ia+f
I use this code:
v = '|'.join(vowels) # Or v = '^'+'|'.join(consonants)
p = r'({0}\-{0}\+.*)|(.*\-{0}\+{0})'.format(v)
However re.match(p,'z-en+iang')
still gives False. So how to fix it? Thanks