Chinese Syllable Structure
Chinese Syllable Structure
- Jisheng ZhangJisheng ZhangEast China Normal University
Summary
Chinese is generally considered a monosyllabic language in that one Chinese character corresponds to one syllable and vice versa, and most characters can be used as free morphemes, although there is a tendency for words to be disyllabic. On the one hand, the syllable structure of Chinese is simple, as far as permissible sequences of segments are concerned. On the other hand, complexities arise when the status of the prenuclear glide is concerned and with respect to the phonotactic constraints between the segments. The syllabic affiliation of the prenuclear glide in the maximal CGVX Chinese syllable structure has long been a controversial issue.
Traditional Chinese phonology divides the syllable into shengmu (C) and yunmu, the latter consisting of medial (G), nucleus (V), and coda (X), which is either a high vowel (i/u) or a nasal (n/ŋ). This is known as the sheng-yun model, which translates to initial-final in English (IF in short). The traditional Chinese IF syllable model differs from the onset-rhyme (OR) syllable structure model in several aspects. In the former, the initial consists only of one consonant, excluding the glide, and the final—that is, everything after the initial consonant—is not the poetic rhyming unit which excludes the prenuclear glide; whereas in the latter, the onset includes a glide and the rhyme–that is, everything after the onset—is the poetic rhyming unit.
The Chinese traditional IF syllable model is problematic in itself. First, the final is ternary branching, which is not compatible with the binary principle in contemporary linguistics. Second, the nucleus+coda, as the poetic rhyming unit, is not structured as a constituent. Accordingly, the question arises of whether Chinese syllables can be analyzed in the OR model.
Many attempts have been made to analyze the Chinese prenuclear glide in the light of current phonological theories, particularly in the OR model, based on phonetic and phonological data on Chinese. Some such studies have proposed that the prenuclear glide occupies the second position in the onset. Others have proposed that the glide is part of the nucleus. Yet, others regard the glide as a secondary articulation of the onset consonant, while still others think of the glide as an independent branch directly linking to the syllable node. Also, some have proposed an IF model with initial for shengmu and final for yunmu, which binarily branches into G(lide) and R(hyme), consisting of N(ucleus) and C(oda). What is more, some have put forward a universal X-bar model of the syllable to replace the OR model, based on a syntactic X-bar structure. So far, there has been no authoritative finding that has conclusively decided the Chinese syllable structure.
Moreover, the syllable is the cross-linguistic domain for phonotactics . The number of syllables in Chinese is very much smaller than that in many other languages mainly because of the complicated phonotactics of the language, which strictly govern the segmental relations within CGVX. In the X-bar syllable structure, the Chinese phonotactic constraints which configure segmental relations in the syllable domain mirror the theta rules which capture the configurational relations between specifier and head and head and complement in syntax. On the whole, analysis of the complexities of the Chinese syllable will shed light on the cross-linguistic representation of syllable structure, making a significant contribution to phonological typology in general.
Subjects
- Linguistic Theories
- Phonetics/Phonology
1. Introduction to Chinese Syllables
In standard Chinese or Mandarin Chinese (henceforth: Chinese), one orthographic character (字) represents one syllable, corresponding to one morpheme. This is a free morpheme in most cases, although words in use tend to be disyllabic, so that Chinese is known as a monosyllabic language in terms of one syllable being one morpheme.1 The syllable structure is simple in terms of segment sequences, i.e., maximally CGVX, where C is a consonant, G is a glide, V is the nuclear vowel, and X is either a high vowel (i/u) or a nasal (n/ŋ). A grammatical syllable in Chinese can be any one of the eleven types in Table 1:
Table 1. Syllable Types in Chinese
IPA | Pinyin | Chinese characters | Gloss | Syllable types | |
---|---|---|---|---|---|
a. | [ɤ] | e | 鹅 | goose | V |
b. | [an] | ɑn | 安 | peace | VC |
c. | [au] | ɑo | 凹 | concave | VV |
d. | [la] | lɑ | 拉 | pull | CV |
e. | [lan] | lɑn | 兰 | blue | CVC |
f. | [lau] | lɑo | 老 | old | CVV |
g. | [luo] | luo | 骡 | mule | CGV |
h. | [luan] | luɑn | 卵 | egg | CGVC |
i. | [liau] | liɑo | 聊 | chat | CGVV |
j. | [wan] | wɑn | 玩 | play | GVC |
k. | [ja] | yɑ | 牙 | tooth | GV |
Note: This table was created by the author, based on factual data on Chinese. In Chinese every full-tone syllable is stressed, and a stressed syllable is bimoraic so that a monophthong rime is equal to VV and VN rimes in terms of length (Duanmu, 2007, pp. 71–84). Pinyin is the system of romanization used to represent the Chinese phonetic notation. In the table, tones are omitted because only segmental structure is of interest here. In fact, every lexical syllable has one of four distinctive tones. The same syllable with the same tone can have different Chinese characters representing different morphemes. For example, [] ([51] is a falling tone) can be the phonological structure for 路 meaning ‘road’, for 鹿 meaning ‘deer’, for 陆 meaning ‘land’, and so on.
Table 1 shows that there are 11 syllable types in Chinese, with CGVX as their maximal structure and V as their minimal structure. In Chinese a V syllable (“a” in Table 1) and a CGVC syllable (“h” in Table 1) are equally heavy in terms of moras. Phonologically, the weight and length of Chinese syllables are decided mainly by tones instead of segmental structure; this is because a stressed syllable has a full tone and is heavy, while an unstressed syllable has no tone and is light (Duanmu, 2014, p. 428). The moraic structure of Chinese syllables can be presented as follows (following Duanmu, 1990, pp. 78–79; Zec, 2007, pp. 174–175; Zhang, 2006, p. 158):2

Figure 1. Moraic structure of Chinese syllables.
In Figure 1, the small, raised digits indicate tones and “0” refers to the absence of tone, also called neutral tone. Any full-toned syllable is bimoraic, and a neutral-tone syllable is monomoraic (“d” in Figure 1). In other words, the initial consonant and the prenuclear glide are nonmoraic, and the weight of a syllable only involves the tone-bearing unit (TBU), which can be V, VV, or VC. However, this article only concerns segmental structure, regardless of tones.
Table 1 also shows that if syllable-final X is a high vowel, CGVV is possible, as in [liau] (聊) ‘chat’. Traditionally, G(lide) is always represented as [i] or [u] when preceded by C (as seen in the forms of “g”, “h,” and “i.” in Table 1) and as a glide [j] or [w] when it is the first segment of the syllable (as in the examples of “j” and “k” in Table 1). The representation in Table 1 follows the traditional Chinese method in order to avoid leading readers to take it for granted that glide is excluded from the rhyme. Regardless of the transcription, the maximal syllable structure in Chinese has only four timing slots: CGVX, each filled with one segment. In traditional Chinese phonology, the internal structure of a syllable is presented as in Figure 2 (according to Xue, 1986, pp. 5, 10).

Figure 2. A Chinese syllable in traditional terms.

Figure 3. A Chinese syllable with English labels.
Figure 2 shows that a Chinese syllable is composed of two parts: shengmu and yunmu. The former is only one consonant, and the latter consists of maximally three elements: yuntou, which is a medial; yunhe, which is a vowel; and yunwei, which can be a nasal (n/ŋ) or a high vowel (i/u). Chinese shengmu can be translated into English as ‘initial’, yunmu as ‘final’; yuntou is the prenuclear glide, yunhe is the nucleus, and yunwei means coda. So the two parts of a Chinese syllable can be referred to as initial and final, and hence this model is referred to as the IF model. The English equivalent of the IF model is presented in Figure 3.
2. The Complexity of the Chinese Syllable Structure
The traditional Chinese syllable structure initial-final (IF) model differs from the onset-rhyme (OR) syllable structure model (see Pike & Pike, 1947).3 IF differs from OR in several aspects: Firstly, the initial is not the same as the onset because the initial of a Chinese syllable can only consist of one consonant, excluding the glide, as Figure 3 shows. Secondly, everything after the initial belongs to the final, which is ternary branching with a glide as a medial, a vowel as a nucleus, and a coda at the end.4 Thirdly, the coda can be either a high vowel or a nasal, so that if a syllable ends in a diphthong, the second vocalic element forms the coda in IF, while it would be analyzed as part of the nucleus in OR. To compare the Chinese IF model with the OR model, consider the representations of Chinese [suei] (岁) ‘age’ in IF and English [sweɪ] ‘sway’ in OR, presented in Figure 4 and Figure 5, respectively.

Figure 4. Chinese [suei] in IF.

Figure 5. English [sweɪ] in OR.
Figures 4 and 5 show that the syllabic structure of Chinese [suei] in IF and that of the English [swei] in OR are very different, though they are similar in terms of segment sequences. Chinese does not have consonant clusters, and the glide is independent of the initial, as shown in Figure 4. In traditional Chinese phonology, a syllable like [wan] is GVC, as presented in Table 1, not CVC like English [wɪn] ‘win’, because it is assumed that there is a zero shengmu when the first segment of a syllable is a glide or a vowel (Wang, 1998/1963, p. 18). For example, the syllables [wan] ‘play’ and [ai] ‘love’ have a zero shengmu in traditional Chinese phonology, as presented in Figures 6 and 7.

Figure 6. The syllable structure of [wan].

Figure 7. The syllable structure of [ai].
Two arguments have been advanced for the assumption of zero shengmu in the Chinese syllable structure: one is that Modern Chinese developed from Middle Chinese (ca. 1000 ad), in which all syllables had one of 36 different shengmu consonants. The second is that in Modern Chinese liaison does not occur between a syllable ending in a nasal and a vowel-initial syllable in a compound word, as exemplified in (1).5
(1)
The data in (1) have led some scholars to think that liaison does not occur between a syllable ending in a nasal and a syllable started with a vowel, because liaison is blocked by the zero shengmu in between. However, in contemporary phonology, some scholars (see Zhang, 2006, 2020) argue against the assumption of zero shengmu in Chinese. Zhang (2020, pp. 213–218) argues that liaison is not a universal phonological rule, so that if liaison does not occur between a word ending with a consonant and a word starting with a vowel, this does not necessarily mean there is a zero/empty onset. For example, in German, too, word boundaries must be aligned with syllable boundaries: liaison does not take place although there is no evidence for a zero or empty onset. Even for languages which do have liaison, the domain for liaison is different from language to language. For example, in English, liaison occurs in the foot domain while French has liaison in the domain of the phonological phrase, so that liaison is much more frequently applied in French than in English, while it is not applied at all in German. In fact, Chinese does sometimes have liaison, but only between a lexical word and a function word, as exemplified in (2).6
(2)
In (2), the original syllable [a] in Chinese stands for an interjection, literally meaning ‘ah’, and is regarded as a function word. The examples in (2) show that in Chinese, liaison occurs between a lexical word ending with a nasal or a (glide-like) high vowel and a function word that starts with a vowel, which strongly suggests that there is no zero shengmu in Chinese syllable structure.
Regardless of the question of whether the Chinese syllable has a zero shengmu or not, the syllable structure in traditional Chinese phonology (see Figures 2 and 3) is problematic in itself in several respects. First, the idea that the final is ternary branching (glide, nucleus, and coda) is not compatible with current phonological theory, which strongly favors binarity (Yip, 2003). Second, a phonological constituent of nucleus+coda as a rhyming unit is not present as a constituent in the traditional Chinese IF syllable structure. Compared with the IF model, the OR model does not have a special slot for the glide, which is in the onset, while a high vowel is in the nucleus. For example, in English the orthographic “i” is an onset glide in ‘pian’ /pjɑ:n/ and a high vowel in ‘piano’ /pi’ænəʊ/, while in Chinese, [i] in [pi] ‘pen’ is a V unit while [i] in [pie] ‘other’ is a G unit, which does not belong to the initial. The question therefore arises of whether the OR model – sometimes assumed to be universal – is suitable for an analysis of Chinese syllables, since the traditional Chinese syllable structure is problematic. One critical point concerns the syllabic affiliation of the prenuclear glide in Chinese.
3. Attempts to Analyze the Syllabic Affiliation of the Prenuclear Glide
Many attempts have been made to analyze the Chinese prenuclear glide (e.g. [j] in bian [pjan] ‘change’) in the light of current phonological theories, particularly in the onset-rhyme (OR) model, based on the phonetic and phonological data for Chinese. Generally speaking, there are six different representations that treat the syllabic affiliation of the prenuclear glide in different ways. Representation 1 proposes that the prenuclear glide occupies the second position in the onset since the glide cannot be the nucleus. Representation 2 proposes that the glide is part of nucleus on the grounds that Chinese does not have onset clusters. Representation 3 regards the glide as a secondary articulation of the onset consonant since in this view it neither belongs to the nucleus nor forms the second part of the onset. In representation 4, the glide is proposed to occupy an independent branch directly linking to the syllable node because it is independent both of the nucleus and of the onset. Representation 5 expresses an extension to the IF model with a binary branching F consisting of G(lide) and R(hyme) and a binary branching R with N(ucleus) and C(oda), as only appropriate for Chinese; and finally representation 6 puts forward a universal X-bar model of the syllable to replace the OR model, based on a syntactic X-bar structure.
3.1 The Prenuclear Glide as the Second Member of the Onset Clusters
As in the OR model, a syllable consists of two parts both in OR theory and in traditional Chinese phonology; in the latter these are initial (shengmu) and final (yunmu). However, the Chinese final is not a unit that plays a role in poetic rhyme, since this unit excludes the prenuclear glide. That is, the prenuclear glide is not counted as part of the rhyming unit, as exemplified in (3).7
(3)
In the poem in (3), the final words in lines a, b, and d have the same rhyming ending ‘-ɑn’ [an], which indicates that the Chinese poetic rhyming unit excludes the prenuclear glide, like [u] in [uan], [j] in [jan], or [i] in [ian] of the final words in lines a, b and d, respectively. Some experimental phonetic analyses (e.g., Howie, 1976; Lin, 1995, among many others) have also found evidence that the tone-bearing unit (TBU) of a Chinese syllable equals that of the rhyming unit, and excludes the initial consonant and the prenuclear glide. Based on such facts, only V(X) or N(C) forms a rhyming unit in Chinese poetic writing and is the TBU. Some scholars (e.g., Bao, 1990; Bao et al., 1997; Tung, 1983; Yin, 1989)argue that the prenuclear glide in Chinese syllable structure can only be the second member of the onset cluster since in the OR model if the glide is not in the nucleus, it must be in the onset. This proposal is illustrated in Figure 8 (taking [lwan] ‘egg’ as an example), parallel to the syllable structure of English ‘twin’ (see Figure 9).8

Figure 8. Glide belongs to onset.

Figure 9. The syllable structure of ‘twin’.
Figure 8 suggests that Chinese syllables can have onset clusters, which contradicts the Chinese phonological tradition and the factual data on some phonological processes in Chinese, including speech errors and the fanqie system of Middle Chinese (see Wang, 1998/1963, pp. 28–29). In Middle Chinese (c. 1000 ad), fanqie was a method for describing the pronunciation of a Chinese monosyllabic character by combining the sound of the initial from the first syllable and the sound(s) of the final from the second syllable. If the target syllable to be described contained a prenuclear glide, the fanqie method treated the prenuclear glide sometimes as the initial part of the first syllable and sometimes as the final part of the second syllable (Yang, 1990), but more often as the final part of the second syllable (Pan, 2001; Wang, 1998/1963, p. 29).9 See some examples of fanqie from Wang (1998/1963, p. 29) in Table 2.10
Table 2. Examples of fanqie (Wang, 1998/1963, p. 29)
Fanqie in Chinese | Target syllable | Source syllables | Process | ||
---|---|---|---|---|---|
a. | 条 徒聊切 | → | [iau] | [u] [liau] | [] + [iau] |
b. | 田 徒年切 | → | [ian] | [u] [nian] | [] + [ian] |
c. | 桓 胡官切 | → | [xuan] | [xu] [kuan] | [x] + [uan] |
d. | 香 许良切 | → | [ɕiaŋ] | [ɕy] [liaŋ] | [ɕ] + [iaŋ] |
e. | 黄 胡光切 | → | [xuaŋ] | [xu] [kuaŋ] | [x] + [uaŋ] |
In Table 2, all the target syllables contain a prenuclear glide and the Middle Chinese fanqie system treats the prenuclear glide as the second part of the syllable; this is the pattern on which the traditional Chinese syllable structure is based, as presented in Figure 2 or Figure 3. Therefore, the argumentation that the prenuclear glide is the second part of the onset in an OR representation of the syllable is not well grounded.
3.2 The Prenuclear Glide as Part of the Nucleus
Some scholars (e.g., Wang & Chang, 2001) have taken findings like these and proposed that Chinese does not have onset clusters so that the prenuclear glide in Chinese must be part of the nucleus in the OR model, following the idea that a segment must be part of the rhyme if it is not a part of the onset. Accordingly, Wang and Chang (2001) and others posit that the syllable structure of [lwan]/[luan] ‘egg’ in Chinese is like that of English ‘gourd’, as presented in Figure 10 and Figure 11, respectively.

Figure 10. Glide as part of nucleus.

Figure 11. The syllable structure of ‘gourd’.
According to Figure 10, in which the prenuclear glide [u] and the nuclear vowel [a] form a diphthong [ua], the glide, as the first element of a diphthong, must be included in the rhyming unit. But, as we saw in (3), the fact is that the Chinese poetic rhyming system excludes the prenuclear glide. Besides, speech errors in Chinese show that in syllables with [ua]/[wa], the [u] and [a] do not always behave as a unit part (evidence will be presented in (4)). Obviously, the idea of treating the prenuclear glide as part of the nucleus is also unacceptable. In the OR model, the Chinese prenuclear glide is really problematic in terms of syllabic affiliation: It cannot be part of either the onset or the rhyme.
3.3 The Prenuclear Glide as a Secondary Articulation
Some scholars (e.g., Duanmu, 1990, 2007; Wang, 1993) have maintained that Chinese syllables do not have onset clusters but that the glide is not part of the nucleus either. To solve this conundrum, Duanmu (1990) and Wang (1993) adduced phonetic evidence on the different realizations of CGVC syllables in Chinese and English. For example, the Chinese word /suan/ or /swan/ (酸) ‘sour’ and the English /swɑn/ ‘swan’ have similar segment sequences, but their initial consonants are realized differently. The acoustic analysis shows that the initial consonant of Chinese /swan/ is more strongly labialized [] than it is for [s] in English /swɑn/. It was also found that the duration of the Chinese syllable CGVC (e.g., swan/suan) is almost the same as that of a word with a CVC syllable structure (e.g., san), while in English the duration of CGVC (e.g., swan) is clearly longer than that of CVC (e.g., sun). Duanmu (1990) and Wang (1993) regard this as evidence that the prenuclear glide in Chinese does not have an independent slot in the syllable structure and claim that the prenuclear glide forms a secondary articulation on the onset consonant. The result is that the maximal syllable structure of Chinese is CVX, rather than CGVX. In their proposal, the Chinese syllable for ‘egg’ can be presented as [an], with the prenuclear glide being a secondary articulation on the onset consonant, as in Figure 12.

Figure 12. Glide as the secondary articulation.
The proposal for the Chinese syllable structure captured in Figure 12 is supported by experimental phonetic analysis but is not supported by native speakers’ cognition of syllable structure, which distinguishes between shengmu (C) and yunmu (GVX, maximally), a distinction which has been cherished for more than 1,000 years in the mind of the Chinese people. It is also not compatible with factual data on Chinese, such as most examples of the fanqie system of Middle Chinese, as presented in Table 2. This is because in the majority of cases, the fanqie system breaks up syllables like [lwan]/[luan] into [l] and [u/wan], as was illustrated in Table 2, and this is unexpected if // forms a unit segment. Finally, if the prenuclear glide forms a secondary articulation on the initial consonant, then /l/, //, and // would be three different lateral phonemes because the syllables // ‘blue’, // ‘twin’, and // ‘connect’ form a minimal triplet. Although this is possible in principle, it would double the number of consonants in Chinese from the traditional inventory of 24 consonants (including 2 glides). In that case, the Chinese consonant inventory would be as large as 48 consonants, as shown in Table 3.
Table 3. A Consonant Inventory with Glide as Secondary Articulation
bilabial | labio-dental | alveolar | retroflex | palatal | velar | |
---|---|---|---|---|---|---|
plosive | ||||||
fricative | f | s | ʂ ʂw | x | ||
nasal | m | n | ||||
affricate | tʂ tʂh tʂw tʂwh | |||||
approximant | w | j |
Table 3 presents a very unusual consonant inventory, which results from the assumption that the maximal syllable structure of Chinese is CVX without a segment position for the prenuclear glide. Economically speaking, to increase the number of phonemes can largely reduce the number of syllables so that Duanmu (2017) argued for the preference to a small number of syllables at the cost of enlarging the segment inventory. Moreover, the reason given by Duanmu (1990, 2007) and Wang (1993) for their argument that the prenuclear glide is a secondary articulation is also not convincing from a phonetic point of view, since the duration of Chinese syllables does not depend on the number of segmental slots but is largely correlated with tone type. Feng et al. (2001, p. 68) carried out an experimental phonetic analysis of the correlation between tone type and syllable duration and found that syllables with five different tones (i.e., T1 [55], T2 [35], T3 [214], T4 [51] and the neutral tone) differ in length, measuring 245 ms, 265 ms, 252 ms, 238 ms, and 188 ms, respectively. For example, the syllable [] ‘hemp’ (with two segment positions) can be longer than that of [liaŋ51] ‘bright’ (with four segment positions, in the traditional view). Finally, the onset [] in [suan] ‘sour’ only shows phonetic labialization when [s] is followed by [u] underlyingly, like /suan/ and /suei/ which may surface phonetically as [swan] and [swei], respectively. In fact, in any case, /s/ is without exception labialized into [] when followed by /u/ in Chinese, regardless of whether /u/ is a glide or a vowel.
3.4 The Prenuclear Glide Independent of Onset and Rhyme
Some scholars (e.g., Shen, 1992; Yip, 2003) have proposed that the prenuclear glide is independent of both the onset and the rhyme, based on the fact that sometimes it occurs independently of the initial, sometimes it occurs independently of the final (as in the fanqie system), and sometimes it is found on its own. The last fact can be exemplified by speech errors in Chinese, as presented in (4).11
(4)
Example (4a) suggests that the CG sequence forms a unit that can replace or be replaced by the C(G) sequence of another syllable. Example (4b) suggests that C by itself is a unit that can replace or be replaced by C of another syllable, excluding G. Examples (4c, d) suggest that GV(X) is a unit that can replace VX, or VX can be replaced by GV(X) between two adjacent syllables. Example (4e) suggests that G of one syllable can replace or be replaced by G of another syllable independently. These examples show that the prenuclear glide sometimes combines with the onset consonant, sometimes with the rhyme, and sometimes is independent of both. On the basis of many such examples from Shen (1992), Yip (2003) proposes a syllable structure suitable for Chinese in which the prenuclear glide directly links to the syllable node between onset and rhyme, as presented in Figure 13 (taking [lwan] ‘egg’ as an example).

Figure 13. Prenuclear glide independent of onset and rhyme.
The idea of the syllable structure with an independent prenuclear glide may capture the flexible behavior of the prenuclear glide. In practice, however, it cannot explain why the glide sometimes goes together with the onset consonant and sometimes with the final VX in speech errors, since neither CG nor GVX is a constituent in the syllable structure proposed in Figure 13. Moreover, theoretically, the ternary syllable structure is not compatible with the binary branching principle of contemporary linguistic theory. Because of such problems, Yip (2003) herself does not adopt the proposal put forward in Figure 13. However, she does point out that the complexity of the Chinese syllable casts doubt on the generally accepted OR model.
3.5 The Prenuclear Glide as Head of Final
It is no surprise, then, that many scholars have expressed the view that representing the Chinese syllable structure in the OR model is problematic. Some scholars (Cheng, 1966, pp. 135–158; Li, 1999, p. 75; Lin, 1989, pp. 72–102, among many others) take the position that Chinese syllable structure is very special and different from that of other languages. They adhere to the traditional IF model, but propose the IF model should include binary branching, as presented in Figure 14 (again taking the Chinese syllable /lwan/ ‘egg’ as an example).

Figure 14. A modified IF model.
In Figure 14, the prenuclear glide is head of the final, which is a constituent dominating the lower constituent of rhyme, each having a binary branching structure. Theoretically, the modified IF model improves on the traditional IF model in Figure 3. With respect to the empirical facts, too, it captures different kinds of phonological behavior that we have seen:
In the traditional Chinese fanqie system, the syllable is divided into two parts: initial and final.
In the Chinese poetic rhyming system, the rhyme is a unit constituent.
In the modified IF model for the Chinese syllable, the initial, glide, and coda are optional since Chinese has grammatical GVX, VX, and V syllables (recall Table 1). The examples /wan/ ‘play’, /an/ ‘peace’, and /ɤ/ ‘goose’ can be analyzed in the modified IF model, as shown in Figure 15.

Figure 15. GVX, VX, and V syllable structures in IF.
The modified IF model captures the internal structure of the different Chinese syllable types. Regardless of whether there is an initial, glide, or coda or not, the final is a constituent of which the rhyme is a subconstituent that forms the poetic rhyming unit and TBU in Chinese. But the modified IF model looks a little bit odd since it has the constituent of final without G, as presented in (b) and (c) of Figure 15. What is more, this proposal would only hold for Chinese syllables, and is not intended as a universal syllable modelprinciple. For instance, if this modified IF model is used to represent English syllables, the constituent of final is completely superfluous. This raises the question of whether yet another syllable structure model is available that can capture the facts of Chinese outlined so far and is also suitable for other languages.
3.6 The Prenuclear Glide as a Specifier of N'' in an X-Bar Model
Based on a syntactic X-bar structure (Chomsky, 1986, 1995), Zhang (2006) and van de Weijer and Zhang (2008) propose an X-bar model not only for Chinese syllables, but for syllable structure in general. Syntactically, it has been claimed that in an X-bar structure there can be multiple specifiers, with two X′′s dominating X′, each with its own specifier (see Chomsky, 1995: Chapter 4; Hornstein, 1999), as presented in Figure 16.

Figure 16. A syntactic structure with two X′′s.
In this multiple-spec construction, there can be two XPs (namely, and X′′), each having a specifier, and only X is the head, with XP as its maximal projection. In syntax, either of the two XPs can be omitted, which can be presented in Figure 17 (represented in the syntax of the sentence ‘We all read books’).

Figure 17. The structure of ‘We all read books’.
The sentence ‘We all read books’ has two XPs, as presented in Figure 17. Syntactically speaking, ‘We read books’ or ‘All read books’ are both grammatical maximal projections of V. Zhang (2006) and van de Weijer and Zhang (2008) propose using this syntactic X-bar structure as syllable structure. This means that the maximal projection is the nucleus (N), since N is the only compulsory element of a syllable, parallel to the verb in syntax. In the multiple-spec X-bar structure, the syllable is , which can dominate another N′′, while the onset consonant and the prenuclear glide are specifiers for and N′′, respectively. Each of these can be omitted; that is, they are optional. The X-bar syllable model is presented in Figure 18.

Figure 18. An X-bar syllable model.
The X-bar syllable model precisely captures the internal organization of the maximal Chinese syllable CGVX (recall that X can either be a high vowel [i] or [u], or either of the nasals [n] and [ŋ]), for example, /lwan/ ‘egg’. In the syllable, G is optional, for example /lan/ ‘blue’, and so is the initial C, for example /wan/ ‘play’, or even C and G, as in /an/ ‘peace’, showing that C, G, and X are all optional in Chinese: only V is compulsory. These four different syllable types can be analyzed in the X-bar syllable model as shown in Figure 19.

Figure 19. CGVC, CVC, GVC, and V syllables in the X-bar syllable model.
Theoretically, in the X-bar syllable model N is always the head of a syllable, regardless of its type. Empirically, the X-bar syllable model can not only represent all types of Chinese syllables, but can also account for the range of findings about Chinese that have been discussed so far:
It satisfies native speakers’ cognition, supported by the fanqie system that a syllable divides into two parts: a specifier C and an N′′ or an N′ constituent.
It captures the fact that the Chinese poetic rhyming unit and TBU is the N′ constituent.
It shows that Chinese speech errors sometimes refer to the domain of N′′, in which GVX go together, and sometimes the domain of N′, in which only VX go together, leaving CG to remain in place or be replaced.
The difference between the X-bar syllable model and the modified IF model is that in the latter the F constituent is compulsory, which is odd and seems to have no purpose in terms of the representation of syllables in other languages like English, while in the former model one N′′ is optional, according to syntactic X-bar theory. Accordingly, the syllable structure of English /twɪn/ ‘twin’ can be presented as shown in Figure 20.

Figure 20. English /twɪn/ in the X-bar model.
Thus, although the X-bar syllable model seems better equipped than the other syllable models discussed previously, it has not been generally accepted. One point of debate may be whether it is a good idea for syllable structure to run parallel to syntactic structure, based on the fact that syntactic structure is different from syllable structure cross-linguistically in that the structure of the former allows recursion while that of the latter does not. However, such parallelism between linguistic modules could be regarded as an advantage (see, recently, Yang & van de Weijer, 2021, who argue in favor of the X-bar model of the syllable in the context of the HPSG12 framework).
So far, we have presented a critical analysis of the six different representations of the Chinese syllable structure and commented on their advantages and disadvantages based on factual data on features such as phonological constraint, poetic rhyming unit, fanqie, speech errors, and universality, which are summarized in Table 4.
Table 4. Evaluation of the Six Representations of Chinese Syllable Structure
*Complex (O) | Rhyming unit | Fanqie | Speech errors | Universality | |
---|---|---|---|---|---|
R1: [[CG] [VX]] | No | Yes | Yes/No | Yes/No/No | Yes |
R2:[[C] [[GV] [X]]] | Yes | No | No/Yes | No/No/Yes | Yes |
R3: [[] [VX]] | Yes | Yes | Yes/No | Yes/No/No | Yes |
R4: [[C][G] [VX]] | Yes | Yes | Yes/No | No/Yes/No | No |
R5: [[C] [G [VX]]] | Yes | Yes | Yes/Yes | No/Yes/Yes | No |
R6: [[C] [[G] [VX]]] in X-bar theory | Yes | Yes | Yes/Yes | No/Yes/Yes | Yes |
Table 4 presents a general picture of evaluation of the six representations of Chinese syllable structure. Among the five aspects of evaluation, “*complex (O)” means no onset clusters. In the fanqie system, sometimes [GVX] is a constituent and sometimes only [VX] is a constituent so that two marks are needed from the two aspects. In speech errors, there are three possibilities: sometimes [CG] is a constituent (see “a” in (4)), sometimes [G] is independent of both C and VX (see “e” in (4)), and sometimes [GVX] is a constituent (see “c and d” in (4)), so that we have three marks respectively. In the evaluation boxes, “Yes” means satisfaction of the evaluation item concerned and “No” means violation.
4. The Chinese Phonotactic Constraints as Theta Rules in an XP
The X-bar syllable structure can also be used to capture further details of Chinese phonotactics, on the basis of the configurational relations between segments in the syllable domain. These phonotactic constraints mirror the theta rules which capture the configurational relations between specifier and head and between head and complement in syntax. In the Chinese syllable, there are constraints between C and G, G and V, C and V, and V and X, parallel to syntactic parameters which configure an XP. In the X-bar syllable model, the constraints of phonotactics of Chinese can be presented as in Figure 21.

Figure 21. Constraints on configurational relations between segments.
Figure 21 identifies six different positional constraints on the configurational relations between segments in a Chinese syllable. Constraint-1 is OCP (Labial), which captures the relation between C and G; constraints-2 and 3 are *+back(C)-back(H) and Agreement-CV(Laminal) on the cooccurrence relations between C and G on the one hand, and C and V on the other; constraint-4 is Agreement-C-HV(apical&retroflex), which constrains relations between C and V; constraint-5 is OCP(High) for the relations between G and V or between V and V; constraint-6 is Coda-Condition (n/ŋ). These optimality-type constraints are defined in (5–10):
(5)
(6)
(7)
(8)
(9)
(10)
A well-formed syllable in Chinese must satisfy all of these constraints. Constraint-1 is OCP(LAB) which forbids sequences of labial consonant and a labial glide as CG, so that labial consonants like [], [p], [m], or [f] cannot be followed by a labial glide [u]/[w], correctly capturing the poor formation of syllables like *[ uo], *[puo], *[muo] and *[fuo] while [uo] is a well-formed GV structure. Note, that, also correctly, OCP(LAB) does not forbid a nonlabial consonant with a labial glide, or a labial consonant and a labial vowel, to be adjacent as CV in a CV(X) syllable, exemplified as in (11).
(11)
The examples in (11a) show that [uo] is a well-formed GV structure which can be preceded by a nonlabial consonant13 to yield grammatical syllables (other examples include [uo] ‘camel’, [luo] ‘mule’, [suo] ‘lock’, [nuo] ‘glutinous’, [tsuo] ‘sit’, and [uo] ‘broad’). In the examples in (11b) and (c), the two adjacent segments are both labial, but [u] or [o] are vowels, which shows that this labial CV structure is also grammatical.14 In fact, Chinese has no [o] as a simplex final. It is believed that [o] only occurs with [u] in either [uo] or [ou], as in (a) and (d). When [uo] occurs after a labial C, however, the prenuclear glide [u] is deleted because of OCP(Lab) so that [puo], [uo], [muo], and [fuo] surface as [po] ‘wave’, [o] ‘slope’, [mo] ‘grind’, and [fo] ‘Buddha’, respectively (for more details see Wang, 1998/1963, p. 19; Zhang, 2020, pp. 208–209).
Constraint-2 is *+bk(C)-bk(H), which requires that if the specifier C is a back consonant, the following segment should not be a high front glide or vowel. In Chinese there are only three back consonants (k, , x (pinyin h)) which appear in the specifier position C. Accordingly, sequences like *[kian], *[y] and *[xi] are illegal. These back consonants can only be followed by a back vowel/glide or a low vowel. Syllables with an initial back consonant may thus include those in (12) (tones omitted):
(12)
Constraint-3 is AGREE-CV(L), which demands that a laminal sibilant consonant must be followed by a laminal high front vowel. In Chinese there are only three laminal sibilants [tɕ], [tɕh], and [ɕ], and two laminal high front vowels [i] and [y], in addition to 12 combinations that start with [i] or [y], including [ia], [ian], [iaŋ], [iau], [ie], [in], [iŋ], [ioŋ], [iou], [yan], [ye], and [yn]. There are altogether only 42 licit syllables starting with [tɕ], [tɕh], or [ɕ] in Chinese, regardless of tones. Some examples are given in (13).
(13)
Constraint-4 is AGREE-C-HV(A&R), which requires that an apical consonant and a high front vowel in a CV syllable must agree in the features of [apical] and [retroflex]. In Chinese there are six apical consonants, of which three are retroflex, and two apical vowels, as presented in Table 5.
Table 5. Apical Consonants and Vowels in Chinese
ts | s | tʂ | tʂh | ʂ | ɿ | ʅ | ||
---|---|---|---|---|---|---|---|---|
[apical] | + | + | + | + | + | + | + | + |
[retroflex] | – | – | – | + | + | + | – | + |
Note: The apical vowels [ɿ] and [ʅ] in Chinese are not regarded as standard IPA symbols; however there are such vowel-like sounds in some other languages, although they are far from common (Maddieson, 1984). Ladefoged and Maddieson (1996, p. 314) describe the apical vowels as fricative vowels.
According to AGREE-C-HV(A&R), an apical nonretroflex vowel [ɿ] only occurs after an apical nonretroflex sibilant [ts], [], or [s], and an apical retroflex vowel [ʅ] only occurs after the apical retroflex sibilants [tʂ], [tʂh], or [ʂ]. Thus, in Chinese when the specifier C is an apical consonant and it is followed by a high front vowel, altogether six syllables are allowed in terms of segmental structure, as presented in (14).
(14)
Constraint-5 is OCP(H), which does not allow the two segments GV or the Rhyme VV both to be [+high], so that *[iu] and *[ui] are unacceptable. Combinations of the glides with mid or low vowels are acceptable, however. For CGVV, CGV, or CVV structures, the grammatical syllables include the following examples in (15).
(15)
In the examples in (15,) which involve either on-glides, off-glides, or both, the glides are [+high], so the nuclear vowel must be [-high]. In Chinese final *[ui] or *[iu] are ill-formed, although the pinyin transcription system somewhat misleadingly has ‘gui’ and ‘liu’ in (15b) and (c), whose actual phonetic realizations are in fact [kuei] and [liou]. Another example of this is pinyin ‘wu’ for ‘乌’ (crow) and ‘yi’ for ‘衣’ (clothes), which look like combinations of two [+high] segments, violating OCP(H). In fact, however, the initial glides in the syllables ‘wu’ and ‘yi’ are best analyzed as inserted for pinyin, so that the real phonetic representations of these syllables are [u] for ‘乌’ (crow) and [i] for ‘衣’ (clothes), respectively: see Wang (1998/1963, p. 28).
Constraint-6 is CODA-CON (n/ŋ), which demands that only the nasals [n] or [ŋ] can occur in coda position if the syllable is closed.
With these six constraints, the total number of Chinese syllables is countable and very much limited. According to Lu’s (2001, p. 29) analysis, based on the Xinhua Dictionary, in Chinese there are 416 syllables altogether, without taking tones into account, and a total of 1,319 syllables with tones. The phonotactics of Chinese can thus be captured mainly by these six constraints governing the positions of the different segments in an X-bar syllable structure, in a similar way to that in which configurational relations are expressed by theta rules between syntactic elements in an XP structure.
5. Conclusion
In summary, Chinese syllable structure, though simple on the surface, has some surprising complexities when analyzed in more detail, not least because of the syllabic affiliation of the prenuclear glide and the exact formulation of phonotactic constraints between segments in the syllable domain. Analysis of these complexities will shed light on the cross-linguistic representation of syllable structure and the extent to which principles of organization in different modules of the grammar can be unified in general.
References
- Bao, Z. (1990). Fanqie languages and reduplication. Linguistic Inquiry, 21, 317–350.
- Bao, Z., Shi, J., & Xu, D. (1997). Generative phonology: Theory and usage. Beijing, China: China Social Sciences Press.
- Cheng, R. L. (1966). Mandarin phonological structure. Journal of Linguistics, 2, 135–158.
- Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press.
- Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT.
- Duanmu, S. (1990). A formal study of syllable, tone, stress and domain in Chinese languages. (Doctoral dissertation, MIT.)
- Duanmu, S. (2007). The phonology of standard Chinese (2nd ed.). Oxford, UK: Oxford University Press.
- Duanmu, S. (2014). Syllable structure and stress. In C.-T. J. Huang, Y.-H. A. Li, & A. Simpson (Eds.), The handbook of Chinese linguistics (pp. 422–432). Chichester, UK: Wiley Blackwell.
- Duanmu, S. (2017). From non-uniqueness to the best solution in phonemic analysis: Evidence from Chengdu Chinese. Lingua Sinica, 3(15), 1–23.
- Feng, Y., Chu, M., He, L., & Lv, S. (2001). A statistic analysis of syllable duration of the Chinese speech. The processing of the papers at the conference of modern phonetics (pp. 66–69).
- Hornstein, N. (1999). Minimalism and quantifier raising. In S. D. Epstein & N. Hornstein (Eds.), Working minimalism. MA: MIT Press.
- Howie, J. M. (1976). An acoustic study of Mandarin tones and vowels. Cambridge, UK: Cambridge University Press.
- Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford, UK: Blackwell.
- Li, C. W.-C. (1999). A diachronically motivated segmental phonology of Mandarin Chinese. New York, NY: Peter Lang.
- Lin, M. (1995). A perceptual study on the domain of tones in Beijing Mandarin. Acta Acustica, 6, 437–445.
- Lin, Y. (1989). Autosegmental treatment of segmental processes in Chinese phonology. (Ph.D. Dissertation, University of Texas, Austin)
- Lu, W. (2001). The disagreement on number (quantity) and composition distribution of Modern Chinese syllables. Language Teaching and Linguistic Studies, 6, 28–34.
- Maddieson, I. (1984). Patterns of sounds. Cambridge, UK: Cambridge University Press.
- Pan, W. (2001). The behavior and principle of Fanqie. Studies of the Chinese Language, 2, 99–111.
- Pike, K. L., & Pike, E. V. (1947). Immediate constituent of Mazatec syllables. Indonesian Journal of Applied Linguistics, 13, 78–91.
- Shen, J. (1992). Types of speech errors. Studies of Chinese Language, 4, 306–316.
- Tung, Z.-H. (1983). Chinese syllables and English syllables. Taipei, Taiwan: Student Book.
- Van de Weijer‚ J., & Zhang, J. (2008). An X-bar approach to Mandarin Chinese syllable structure. Lingua, 118, 1416–1428.
- Wang, H. S., & Chang, C.-L. (2001). On the status of the prenuclear glides in Mandarin Chinese. Language and Linguistics, 2, 243–260.
- Wang, J. Z. (1993). The geometry of segmental features in Beijing Mandarin. (Ph.D. Dissertation, University of Delaware.)
- Wang, L. (1998). The Chinese phonology. Beijing, China: Zhonghua Book Company. (Original work published 1963)
- Xue, F. (1986). Analysis of the Chinese phonology. Taipei, Taiwan: Students’ Book Company.
- Yang, C., & van de Weijer, J. (2021). An HPSG approach to Chinese syllable structure and tone sandhi. Lingua, 258, 103048.
- Yang, Y. (1990). The theory and fanqie of the front and back medial glides in Li’s Description of Sounds. Journal of Xuzhou Normal College, 1, 165–172
- Yin, Y.-M. (1989). Phonological aspects of word formation in Mandarin Chinese. (Ph.D. Dissertation, University of Texas)
- Yip, M. (2003). Casting doubt on the Onset-Rime distinction. Lingua, 113, 779–816.
- Zec, D. (2007). The syllable. In P. de Lacy (Ed.), The Cambridge handbook of phonology (pp. 161–194). Cambridge, UK: Cambridge University Press.
- Zhang, B. (2002). Modern Chinese. Shanghai, China: Fudan University Press.
- Zhang, J. (2006). The phonology of Shaoxing Chinese. (Ph.D. Dissertation. Leiden University, Utrecht)
- Zhang, J. (2020). Contrastive studies of English and Chinese phonology. Beijing, China: Foreign Language Teaching and Research Press.
Notes
1. In Chinese, there are also some disyllabic monomorphemic words which are generally of three types: (1) some typical Chinese terms like [moli] (茉莉) ‘jasmine’; (2) continuous words with the same onsets as alliteration or the same rhyme, like [] (枇杷) ‘loquat’ and [meiguei] (玫瑰) ‘rose’; and (3) reduplicated words like [tɕhytɕhy] (蛐蛐) ‘cricket’. Such disyllabic words constitute one free morpheme, but in some cases, one of the two characters can represent the meaning of the disyllabic word.
2. In Duanmu (1990, pp. 78–79), a bimoraic monophthong is presented as a long vowel (e.g., [ɤ:]). But in this article, a bimoraic monophthong is not presented as a long vowel, following the well-acknowledged vowel inventory of Chinese (Wang, 1998/1963, pp. 12–22).
3. In the OR model (Pike & Pike, 1947), a syllable has two parts: onset and rhyme, the latter of which consists of nucleus and coda. The onset contains all consonants before the nuclear vowel and the coda all those after the nuclear vowel.
4. In the Chinese CGVX syllable structure, G is glide, and sometimes called a medial glide, or prenuclear glide, all referring to the same thing. In this article, the term “prenuclear glide” is usually used.
5. The dot ‘.’ in the square bracket indicates the syllable boundary and is also the morpheme boundary. The [ɚ] sound is a rhotic (or rhotacized) mid vowel.
6. A syllable starting with a velar nasal is ungrammatical and therefore there is no Chinese character that represents [ŋa], which is only a phonetic realization.
7. This poem is titled “April in the Countryside”; it was composed by Weng Juan during the Song Dynasty (960–1279) and translated by He Gongjie in 2011. Every line of the poem is presented in pinyin, with Chinese characters beneath and the English translation on the right.
8. Throughout the whole text, the symbol “σ” stands for syllable, “O” for onset, “R” for rhyme, the initial “C” for consonant, “N” for nucleus, and the final “C” for coda.
9. In Yang (1990, p. 171), sometimes if the target syllable is [aŋ], then the source syllables can be [əŋ] and [ɕiaŋ] which contains the prenuclear glide, so that the process is []+[aŋ], resulting in the pronunciation of the target syllable [aŋ], which suggests that the prenuclear glide is not in the second part of a syllable in this case. See Table 2 for more examples of fanqie.
10. The phonetic transcription here is the Modern Chinese pronunciation, which is different from that of Middle Chinese. However, the operation of the fanqie division system is the same.
11. Examples (4a–e) are from Shen (1992), who uses pinyin, not IPA, but are presented here in IPA; the prenuclear glides are presented as [j] and [w].
12. HPSG (head-driven phrase structure grammar) is a model of generative grammar that incorporates linguistic information using the same structures—that is, Typed Feature Structures (TFSs)—in multiple linguistic dimensions, including the phonological, lexical, semantic, syntactic, and textual levels (see Yang & van de Weijer, 2021, p. 1).
13. GV [uo] can be preceded by any nonlabial consonant except a laminal sibilant [thɕ], [tɕ], or [ɕ], which is constrained by AGREE-CV (L) in (7).
14. Among the five Chinese phonemic vowels (i y u ə a) (see Zhang, 2020, p. 78), /y/ is a marked vowel which also behaves very differently from other high laminal vowels. The Cy combination is very much restricted, disallowing */py/, */phy/, */my/, */ty/, and */thy/, while the vowels /i/ and /u/ are fine after these consonants.