  • Geoffrey K. PullumGeoffrey K. PullumUniversity of Edinburgh, School of Philosophy, Psychology and Language Sciences


English is both the most studied of the world’s languages and the most widely used. It comes closer than any other language to functioning as a world communication medium and is very widely used for governmental purposes. This situation is the result of a number of historical accidents of different magnitudes. The linguistic properties of the language itself would not have motivated its choice (contra the talk of prescriptive usage writers who stress the clarity and logic that they believe English to have). Divided into multiple dialects, English has a phonological system involving remarkably complex consonant clusters and a large inventory of distinct vowel nuclei; a bad, confusing, and hard-to-learn alphabetic orthography riddled with exceptions, ambiguities, and failures of the spelling to correspond to the pronunciation; a morphology that is rather more complex than is generally appreciated, with seven or eight paradigm patterns and a couple of hundred irregular verbs; a large multilayered lexicon containing roots of several quite distinct historical sources; and a syntax that despite its very widespread SVO (Subject-Verb-Object) basic order in the clause is replete with tricky details. For example, there are crucial restrictions on government of prepositions, many verb-preposition idioms, subtle constraints on the intransitive prepositions known as “particles,” an important distinction between two (or under a better analysis, three) classes of verb that actually have different syntax, and a host of restrictions on the use of its crucial “wh-words.” It is only geopolitical and historical accidents that have given English its enormous importance and prestige in the world, not its inherent suitability for its role.

1. Introduction

Contemporary standard English is almost certainly the most studied language in all of human history. The time when Latin would have held that title is surely several hundred years in the past. Thousands of English grammars and books on the history and structure of the language have been published since the late 16th century, and as linguistics expanded dramatically in Britain and America during the 20th century (especially the second half), the study of English by descriptive and theoretical linguists grew explosively. One consequence of this is that, naturally, an article of this size cannot possibly provide a general guide to the enormous scholarly literature. The few bibliographical references given are intended solely for attribution of quoted remarks or specific claims; they do not constitute anything like a representative sample or a balanced reading list.

It has become a cliché that English, with something like half a billion native speakers and a comparable number of second-language speakers, plus hundreds of millions of other less expert users, is now essentially a global language, the language of planet Earth. This does not mean it is used everywhere: the traveler who expects to find English widely understood in provincial Brazil, southern China, or central Russia will be disappointed. Nonetheless, English is vastly closer to having global status than any other of the world’s roughly 7,000 languages has ever been. A Hungarian physicist explaining a German research breakthrough to French, Russian, and Chinese colleagues at a conference in Japan will have no hesitation about the language in which to address them: no one would expect anything else but English to be used. English is so firmly established as the language for scholarly and scientific communication that several non-English-speaking countries in Europe are beginning to make English the recommended or even required language for doctoral theses and academic publication, and the medium of instruction for many courses, especially at the graduate level. Today about 90% of scientific publications worldwide are in English.

English also has a role as official or governmental language to an extent that no other language has ever even approached. It is the main language of government, or at the very least commonly used for official purposes, in something like 60 of the world’s 192 countries. It has no rival as the language of government in the United States, the United Kingdom, Canada, Australasia, and most of the Caribbean. It is widely employed for government purposes in India, many countries in Africa and the Arab world, and the European Union. This immediately gives it huge importance for at least two billion people, which is more than a quarter of the earth’s population.

This remarkable preeminence has arisen over as little as three or four hundred years. In 500 ce the ancestor of English was just a collection of insignificant coastal offshoots of Western Germanic—the dialects of the Angles, Saxons, and Jutes in what is now northern Germany and the Netherlands. What eventually put English into the position it holds today was a combination of fortuitous circumstances, not any inherent virtue or suitability. The language benefited from historically accidental events that encouraged its spread—happy accidents, one might say, were it not for the fact that some of the events were hardly happy: devastating wars, ferocious episodes of imperialism, and huge crimes against humanity like the African slave trade, the dispossession of the aboriginal inhabitants of Australia, and the slaughter of the native peoples of North America.

The imperialist expansion that created the British Empire involved the colonization of all of North America, all of Australasia, most of the Caribbean, most of South Asia (what now comprises countries like India, Pakistan, Sri Lanka, Bangladesh, Myanmar, Malaysia, and Singapore), about half of Africa (Egypt, Sudan, Uganda, Kenya, Tanzania, Zambia, Zimbabwe, South Africa, Swaziland, Namibia, Nigeria, Ghana, Sierra Leone), and many other countries and territories: Eire, Israel/Palestine, Iraq, Hong Kong, Malta, Borneo, Sarawak, the Philippines, Papua New Guinea, Fiji, Yemen, and the Arabian emirates, and so forth. In all such places, administration and government continue to show the influence of English.

Although at one time German stood a fair chance of becoming the language of the nascent United States, and French the language of Canada, English ultimately emerged as the overwhelmingly dominant language for the majority of the population of postcolonial North America. The United Kingdom and the United States were on the victorious side in both the first and second World Wars of the 20th century, further establishing the significance of the language they shared. English has steadily replaced French as the language of diplomacy, and has been adopted as the basis for the controlled registers used in communications by the crews of ships and planes when speaking to each other or to controlling authorities—no one can become an airline pilot or captain a ship without a fair command of English.

The entertainment, communications, and high technology industries also became relevant to the spread of English. The rise of Hollywood resulted in the emergence of the first place in the world where movies were made for budgets exceeding $100 million and successfully distributed worldwide—always with English as the language of virtually all the dialogue. Radio and television were first fully developed in Britain and the United States, and the BBC became a respected worldwide broadcaster. The publishing industry’s production of newspapers, magazines, and books led to more publications in English than any other language. Today English is unrivaled as the dominant language of print publication, radio, television, cinema, telecommunications, software, and the Internet.

2. The Remarkable Unsuitability of English

The global dominance of the English language, which in some parts of the world sadly threatens to drive minority languages into extinction, could never be argued to have a basis in linguistic properties that made it suitable for its role. There is a long tradition of prescriptive usage advice on the use of the English language, and the advice-givers frequently allude to properties like being logical or orderly (Heffer, 2010 is a paradigm example, alluding to logic half a dozen times in just the Prologue). But those who imagine that English is well suited to a role as lingua franca by virtue of its simplicity, logicality, regularity, clarity, or ease of learning have not examined the linguistic evidence. Triumphalist anglophiles might argue otherwise, but English could never have been picked as the planet’s chief language if a well-informed committee had made the decision on linguistic grounds alone.1 This topic—the linguistic unsuitability of English for a role as world lingua franca—is worth elaborating in terms of (at least) pronunciation, orthography, lexicon, morphology, and syntax, as such an elaboration provides a useful overview of several interesting features of the language.

2.1 Pronunciation

At the phonetic level, English is cursed with ridiculous consonant clusters at which speakers of huge numbers of other languages would balk: an utterance like the sixth spring strictly has six consecutive consonants in careful speech (five of them obstruents): [ksθ‎spr]. In Our strengths spring from our unity we find up to seven: various simplifications occur, but one careful pronunciation would have [ŋ‎kθ‎sspr]. Most of the population of the world would struggle to produce such phonetic monstrosities.

Most dialects of English have nine or ten fricatives: [f θ‎ s ʃ h v ð z ʒ] plus marginally [x] in words like Bach and loch. This is a large number (some languages have no fricatives at all). The distinction between labiodental [f v] and dental or interdental [θ‎ ð] among the fricatives is an unusual one (and it is of course partially absent in the dialect of East London normally known as Cockney, where [f] frequently replaces [θ‎], and [v] sometimes replaces [ð]: see Sivertsen, 1960). The distinction between alveolar [s] and palato-alveolar [ʃ] is a real problem for speakers of languages having only the former, like Icelandic; the clear contrast between voiced and voiceless stops in initial, medial, and final position is a struggle for many other languages (e.g., German, Polish, Russian, etc.); the distinction between rhotic [r] and lateral [l] is a difficulty for speakers of Chinese and Japanese; and the crucial importance of word accent and sentence stress differentiates English from many languages where syllables differ much less in amplitude.

Turning to the vowels, whereas huge numbers of the world’s languages have systems of just five vowels like Spanish [i e a o u], or systems not too radically different from that, most dialects of English employ an unusually large repertoire of about 20 vocalic nuclei, with a mix of short pure vowels, long pure vowels, rising and falling diphthongs, and triphthongs, including some quite rare vowels like British English [ɜ] (American English [ɝ]). Moreover, the vocalic inventory varies strikingly between dialects. Approximations to the pronunciation of English vocalic nuclei by speakers of such languages are very rough, with the result that English as spoken by speakers of some other languages can be strikingly hard to understand: native speakers of another language may pronounce modal identically with model; bird with bed; seat with sit; pull with pool; and so on.

It should also not be forgotten that instead of having a fairly even pronunciation of successive syllables with only light modification due to accentuation or rhythm, English has a massively complex pattern of word stress that is absolutely crucial to intelligibility, and interacts with the vowel alternation system. Pronouncing a word like constituency with roughly equal amplitude on all syllables makes it close to unintelligible in connected speech. Pronouncing phótograph, photógraphy, and photográphic correctly involves controlling two different pronunciations of the underlined syllables pho-, -tog-, and ‑graph-, and three different locations for the heaviest stress, and knowing which goes with which, in which word.

To some extent systematic principles are involved here: although words like *photographology and *photographological do not exist (I mark them with the traditional asterisk that linguists use for both hypothesized reconstructions and ungrammatical sequences), native speakers know exactly where they would place the heaviest stress in such words (on the ‑phol- of the first and the ‑log- of the second), and how they would pronounce the vowels. Inexpert non-native speakers do not. But this does not mean that there is a stress rule that could be taught to second-language learners: most phonologists today would agree that the attempt by Chomsky and Halle (1968) to describe English pronunciation in full without ever marking stress position in a lexical entry was quixotic.

2.2 Orthography

The English spelling system may rank among the hardest writing systems in which to become fluently literate. The Chinese character system is of course the hardest, and Japanese is also highly complex, as are Thai and Cambodian; but for an alphabetic system with 26 letters, English is astonishingly difficult, being grotesquely inconsistent in both its sound-to-spelling and its spelling-to-sound mappings.

The examples are familiar: the verb winded doesn’t rhyme with minded; nor does near rhyme with pear, or (in British dialects) cover with hover or over. The first vowel in bother is nothing like the vowel of both (and the fricatives are different too), and brother has yet another different vowel. Finger has a /g/ in it but singer (as pronounced in the United States or southern England) does not. Pillar, filler, Alistair, victor, and sulfur all have the same vowel in the second syllable, but it is spelled differently in each case. The vowels of cough, tough, dough, and through are all different, and none of them the same as the one in second syllable of thorough. The words herd, bird, word, and curd are exact rhymes but have four different vowel letters. And so on for hundreds of such anomalies.

It is actually hard to find any letter of the alphabet that unambiguously represents one specific sound. And it is easy to find English phonemes that have ten or more spellings: for British dialects at least, /ju/ is spelled in 11 different ways in beautiful, eulogy, queue, pew, ewe, lieu, mu, cute, cue, Pugh, and you; and in all dialects /k/ is spelled in 11 different ways in cord, accord, chord, saccharine, lock, key, khaki, Nikki, Iraq, quiche, and Urquhart. For an alphabetical writing system that is supposed to be based on the phonetic/phonological form of words, this sort of chaos is truly extreme.

Chomsky and Halle (1968) make some ingenious attempts at defending English spelling, arguing that it reflects an underlying phonological reality rather than superficial phonemics. For example, they argue for representing resign as /resign/ (1968, pp. 233–234), with phonological rules responsible for tensing /ɪ/ to /i:/ before velars, deleting velar fricatives before final nasals, and vowel-shifting /i:/ to /ai/, so that the form of resign can be related to that of resignation (which has a phonetic /g/). They even suggest that giraffe should have the underlying form /giræffe/ (1968, pp. 150–152), with the rules placing stress on the penultimate syllable before the final e is deleted and the geminate reduced. But one might reasonably object that Chomsky and Halle’s rules to some extent recapitulate history, thus confusing outdated aspects of the spelling system with diachronic understanding of the last five or six centuries of sound changes.

A serious attempt to lay out the rules of English spelling is made by Rosenfelder (2000). He gives 56 rules for predicting pronunciation directly from standard spelling (rules that he has implemented in a freely available computer program), and he claims that they are respected in 85% of English words. One thing remains quite clear, however: learning to spell English words is a vastly more complex business than learning how to spell Swahili or Italian or Finnish.

One valid point to make against trying to reform English spelling is that it would have to privilege one dialect over the others: no attempt to spell English phonemically could work for every dialect. Good and blood have the same vowel in Northern England but different vowels in Scotland; good and goose have the same vowel in Scotland but different vowels in northern England; and in southern England all three words (good, blood, and goose) have distinct vowels. Mary, marry, and merry are distinct in southern England but are merged in some American dialects; Don and Dawn have the same vowel in western U.S. dialects; and so on. But this merely indicates an equal-opportunity aspect to the dreadfulness: nowhere on earth do people speak the English language in a form that yields a simple correlation between the pronunciations of words and the standard ways of spelling them. The phonics method of teaching children to read, which is a long-standing, very successful educational experiment, works despite the dreadfulness, not because of it.

2.3 Lexicon

English has a huge total vocabulary built up over a thousand years of literacy (see Algeo, 1999 for a thorough survey), and it contains words from several distinct overlaid strata, the primary ones (with very roughly a quarter of the vocabulary each) being Germanic, French, and Latin.

It is worth noting, however, that the vocabulary size of English is commonly overstated and over-interpreted. Meyer (2014), in a popular book on transcultural communication, claims that English has 500,000 words but French has only 70,000, and links this (rather bafflingly) to the degree of dependence on context found in conversations in the two languages—as if the French have to do more subtle implying of their intended meanings because they are short of words.

The 500,000 estimate for English seems wildly exaggerated no matter what counting method is used. For Meyer’s purposes we would surely want to count lexemes rather than word forms: take, takes, took, taken, and taking should count as forms of a single English word (the correspondent of French prendre), not as five different words. But counting lexemes over the whole history of English, The Oxford English Dictionary finds only about 170,000 that could possibly be said to be in common use today. The American Heritage Dictionary makes it more like 300,000 words, but that is mainly because it is much more encyclopedic, giving entries for surprisingly large numbers of multiword phrases that are not really idiomatic and do not need a separate lexical entry (from abominable snowman to zoological garden). They also list large numbers of personal names of historically significant people (from Aaron to Zworykin), and place names (from Aachen to Zwolle). But there is no published dictionary listing 500,000 words. Moreover, huge numbers of the words listed in big dictionaries—words like aa or zythum that are completely unknown to most speakers—are an irrelevance when considering use of the language, since no speaker’s verbal behavior could be influenced by words that are never encountered.

When we examine running text and treat each distinct string of letters as a new word (which not only counts word forms of a given lexeme separately but also allows in all sorts of flotsam and jetsam such as roman numerals), a million word tokens of typical novels tend to yield no more than about 30,000 distinct word forms (a number that would slowly increase as we extend the corpus, since despite massive overlap each novel would contain a few additional rare words that none of the others had used).

In short, it is very hard to see how any mis-definition of “word” could delude someone into thinking that there are 500,000 words in English.

That said, it is undeniable that the English word stock is rich. Its historical layering often gives us distinct words of quite different vintages and origins associated with a single meaning.

For talking about reptiles, we have snake, which comes from the Germanic roots of English (Old English snaca); serpent and lizard, which entered Middle English from French (Old French serpentin and lesarde); and herpetology (the study of reptiles) via a Renaissance borrowing from Greek (Ancient Greek herpeton).

Relating to the heart, we have the Anglo-Saxon heart (from Old English heorte) but also cordate (going back to Latin cord-) and cardiac (going back to Greek kardia).

Associated with the number 5 we have Germanic five (Old English fif); direct borrowings based on Latin (quinque) like quinquennium; and direct borrowings based on the Greek equivalent (pente) like pentarchy or pentathlon.

In connection with the domestic chicken we have hen, cock, rooster, and chick direct from Old English; pullet and capon from French; and scientific terms like galliform and gallinaceous borrowed directly from Latin (gallus, “cock”).

Often it is important to distinguish words from the different layers in order to separate crude or obscene talk from technical language: along with the standard medical term penis (Latin) we have on one side the vulgar word cock (Old English) and at the other extreme the learned term phallus (Greek).

On top of all this stratal layering of orthographically simple words, English has a large stock of idioms of various lengths with meanings almost totally unpredictable from the meanings of the components:

2 words:

let go; make do; lose face; white elephant; wing it; buzz off; . . .

3 words:

kick the bucket; spill the beans; eat humble pie; chew the fat; . . .

4 words:

raining cats and dogs; elephant in the room; . . .

5 words:

back to the drawing board; shooting fish in a barrel; . . .

6 words:

put the cat among the pigeons; hit the nail on the head; . . .

7 words:

the straw that broke the camel’s back; let the cat out of the bag; . . .

8 words:

have eyes in the back of your head; . . .

For coining new words, English has a rich array of mainly suffixing word-formation, with wildly varying productivity, much of it showing traces of Latin and Greek morphology. For example, regular (pertaining to rules) and legal (pertaining to laws) have what is essentially the same suffix, but it is subject to a Latin dissimilation barring -al from being attached to a stem ending in -l. Evade is obviously related to evasion, and erode to erosion, but they involve an alternation between a stop and a fricative that English has inherited from Latin. There are hundreds of other similar partial inheritances.

2.4 Morphology

English has a striking profusion of irregular verb morphology. Approximately 200 of its verbs (dialect differences make it pointless to settle on a precise number), including many of the most frequently occurring, have at least some non-default inflectional forms or missing forms. This should be compared with Swahili, which despite its complex verb structure has almost no irregularity of the same sort—a few idiosyncrasies in the case of the copular verb in the present tense, but nothing like the bewildering array of ways to form the preterite tense that English exhibits. The number of inflectional forms for an English verb may be 1, 2, 3, 4, 5, or (in the unique case of be) 12. And there are at least eight quite distinct paradigm layouts for verbs, if we include the anomalous verb beware, which most speakers use only in the plain form (it has no preterite tense or participles, and for most speakers no present tense). Table 1 shows the seven other verb paradigm shapes. The terminology employed is that of Huddleston, Pullum, Bauer, Birner, and Briscoe (2002; henceforth CGEL). The peculiar ordering of the rows is chosen so that syncretism (identity of phonological realization between paradigmatically distinct forms) can be represented by sharing of cells. Thus the table shows that the modal verb must has the same phonological shape for the first-, second-, and third-person singular and all persons in the plural; the dashes in the rows for the preterite (simple past) tense show that it lacks all preterite tense forms; but like all the modals (except for may in most contemporary dialects), it has a negative form ending in the suffix ‑n’t.2

Table 1. Seven Different Verb Paradigms in Standard English

Number of shapes⟶








3sg present








1sg present





2sg or plural pres
















1sg pres neg






3sg pres neg


other pres neg





An unpleasant fact for the adult learner of English is that some verb paradigms have radical irregularities including suppletion (the verb go has the preterite form went; the verb whose plain form is be has present-tense forms beginning with vowels and preterite forms beginning with w-) or paradigm gaps (the verb must has no preterite; none of the modal verbs have plain forms or participles; stride has no established past participle). In a number of cases the morphology of other languages intrudes (for the plurals of larva and stimulus English borrows the Latin plurals larvae and stimuli; for the plurals of phenomenon and criterion it has the Greek plurals phenomena and criteria).

It has occasionally been suggested that the grammar of contemporary English shows some of the signs of a language that has been creolized in its past, after the Norman conquest of 1066 as a Norman French ruling class took over the country and reduced the Anglo-Saxon population to serfdom. English could be said to have a radically stripped-down inflectional system compared to all the other Germanic languages. But a creolization hypothesis for modern English would be highly controversial. Creoles typically have essentially no inflection whatever, whereas as we have seen English, while far less conservative than Icelandic, could not possibly be described as simple. Though sometimes spoken of as having little grammatical inflection to learn, English has enough complexity in this domain that the inflection chapter of CGEL occupies 55 pages.

2.5 Syntax

In at least one respect English has a globally widespread and popular syntactic property: it is a Subject-Verb-Object language, like nearly every Western European language (the Celtic family being the exception), and also Chinese, most languages of Southeast Asia (Khmer, Lao, Thai, Vietnamese), many languages of Africa (Hausa, Igbo, Swahili, Yoruba, Zulu), most modern Semitic languages (colloquial Arabic, modern Hebrew), and thousands of others—about 3,000 languages in all are SVO, making it the second most common default order of clause constituents (only SOV tops it), and it is all but universal among Creole languages. However, as we view the syntax more closely and in more detail, plenty of complexity emerges. I will review just three areas that display this rather clearly.

2.5.1 Verbs, Prepositions, and Prepositional Verbs

English has a rather rich system of subcategorization for lexical heads: it is unavoidably necessary to classify verbs in particular (but also adjectives, nouns, prepositions, and even adverbs) with regard to the number and category of the complements they take. A brief and incomplete sample list of verbs (I write “of-PP” to mean “PP with of as its head preposition,” and so forth):


takes an obligatory object NP


takes either no complement or an NP or an of-PP (different senses)


takes either one NP or two NPs (different senses)


takes an NP and a to-PP


takes an optional object NP


takes no complements


takes an NP and an optional with-PP (different senses)


takes either two NPs or an NP and a to-PP


takes either no complement or an at-PP


takes a for-PP


takes a with-PP


takes an of-PP

There is a related feature of English syntax that seems so familiar to native speakers that they do not perceive its complexity or the difficulty it poses for foreign learners. In addition to its host of tricky preposition government issues, it has a riot of prepositional verb constructions, both idiomatic and compositional, which are of high frequency and huge importance in everyday conversational use of the language. Learning several hundred of these is unavoidable, and their syntax is by no means straightforward.

Six syntactic patterns involving verbs closely allied with specific prepositions need to be distinguished:


Verb followed by PP consisting of Preposition plus object NP: I saw [to the necessary clean-up].


Verb followed by object NP followed by PP: I attributed [his collapse] [to the intense heat].


Verb followed by two PPs: I looked [to her] [for assistance].


Verb followed by PP consisting of Preposition plus predicative complement: It won’t count [as a research publication].


Verb followed by object NP followed by PP containing a predicative complement: They won’t regard [that] [as a research publication].


Verb followed by PP with object NP, followed by PP containing a predicative complement: We think [of it] [as a perk of the job].

Within each type, there are syntactically idiosyncratic special cases. For example, within type I, we find combinations like bank on, which can appear in passive form (someone you can count on / someone who can be counted on) and others that cannot (The idea slowly grew on me / *I was slowly grown on by the idea); and we find combinations like testify to, which can be disrupted by wh-preposing (something I can testify to / something to which I can testify) and others that are (as it were) syntactically frozen in place (something I never got over / *something over which I never got).

The same sort of variation is found in type II: accuse someone of something allows preposing of the PP (the crime of which he had been accused) but get someone through something, belonging more to informal style, does not (as seen with *the bereavement through which her faith got her).

In type III the verb selects both prepositions, and meaning plus common sense would not suffice to predict what the preposition choices should be: it is look to someone for something, not *look at someone to something; agree with someone about something, not *agree to someone of something; and so on (errors on such matters by non-native speakers are extremely common).

In scores or perhaps even hundreds of cases, a particular verb together with a particular preposition in one of the constructions in (I)–(VI) is interpreted idiomatically. It has often been suggested that in those cases the verb and the preposition constitute a single verb that is lexically listed with the special meaning, for example, that in come across + NP with the meaning “encounter NP fortuitously” the sequence come across could be listed as a verb spelled with an internal space meaning “encounter fortuitously.” For a brief argument against such analyses see CGEL, p. 277, and for a more detailed examination, Baltin and Postal (1996).

The issue of the so-called verb-particle construction brings in a new complexity. First, notice that there are prepositions like in and up that take an NP complement that is optional (Take this up the stairs; Take this up), but there are others that never take NP complements. Some of the latter originate as compounds of what used to be a preposition and its object (up + stairs > upstairs), but in other cases there is no such origin: away and back are examples of monomorphemic prepositions that never take an NP (CGEL, chapter 7; for detailed arguments in favor of this view, see Emonds, 1972).

Let us call a preposition intransitive if it does not take an NP complement, and transitive if it does. There are two classes of intransitive prepositions: regular ones, which head one-word phrases that have to go where PPs go, and light or mobile ones, which have another option. The light ones, commonly known as particles, can be positioned before the object of a verb, provided it is not a pronoun. Upstairs is the regular kind of intransitive preposition, but up is the light kind. Thus we find this contrast:

Take that box up when you go.     Take that box upstairs when you go.     Take up that box when you go.     *Take upstairs that box when you go.

There are sharp differences in syntactic behavior between regular prepositions (whether transitive or intransitive) and the so-called particles (see CGEL, pp. 280–283), though things are complicated by the existence of a few cases of particles that are homonyms of regular prepositions (shout down the proposal has the particle down, but shout down the phone has down as a transitive preposition) and a few cases of homonymy between a verb + particle combination and a verb + transitive preposition sequence with somewhat similar meanings (see through the scheme has the verb + particle structure when it means “see the scheme through to completion” but the verb + transitive preposition structure when it means “perceive the true nature of the scheme”).

With all the construction types (I)–(VI), and with the verb-particle construction, there can be idiomatic meanings assigned, and there are hundreds of them (see CGEL, pp. 286–290 for a brief introduction).

The point being made with all this discussion of prepositions, prepositional verbs, and particles is that the syntax and lexicon of English are vastly more complex than one might at first perceive. Traditionally English has been spoken of as having hardly any grammar (or even “no grammar”) by people who see the inflection of nouns and verbs in Latin or the case system of Finnish as grammatical complexity; but they miss the unnoticed extraordinarily complicated nature of the grammatical restrictions found in the familiar sequences of verbs and prepositions that occur in almost every sentence English speakers utter.

2.5.2 The Auxiliary Verb System

English is unlike most languages in having a special subset of verbs that exhibit a strikingly different syntax. There is a huge majority of verbs—an open class containing thousands of them—that can appear in ordinary affirmative declarative independent clauses with the default SVO constituent order (I will call these canonical clauses) but can never take special roles like preceding the subject in an interrogative, or taking a following not to negate the clause. A small, closed class of a dozen mostly irregular verbs can appear either in these special environments or in canonical ones. And there is one verb, an intransitive use of do, that is required to be in a special environment. CGEL calls the first class of verbs the lexical verbs. Verbs in the second class are generally known as the auxiliaries, and they are: be, can, dare, have, may, must, need, ought, should, and will.

As described in formal detail in Sag et al. (2020), several constructions depend crucially on this distinction.

Verb-initial clauses like closed interrogatives (May we come in?) or those with preposed negative adjuncts (Never did I imagine anything like that) can only be formed when the verb is an auxiliary (*Came they in?; *Imagined you anything like that?).

A clause can be negated by a not following an auxiliary verb (They have not given permission), but this is impossible with a lexical verb (*They gave not permission).

The polarity of a clause can be stressed by placing heavy accent on an auxiliary verb (He hás made a lot of trouble contradicts “He hasn’t made a lot of trouble”), whereas heavy accent on a lexical verb tends to suggest that the speaker is arguing against the use of a different verb (He máde a lot of trouble contradicts “He experienced a lot of trouble” or “He saw a lot of trouble,” and so on).

An auxiliary can end a verb phrase, the semantics of the verb phrase being completed by filling in the meaning of some phrase denoting a property salient in the context as if it were the complement of the auxiliary: You may not believe that I’ve swum the channel, but I have ____ is most naturally interpreted with a meaning incorporating “but I have swum the channel.” A lexical verb does not behave in the same way: *You may not believe that I enjoyed swimming the channel, but I enjoyed ____ is not even grammatical.

And so on. In many languages, the syntactic properties of verbs apply to verbs quite generally: they all fill the same role such as heading the verb phrase or beginning the clause. Having a special subset with quite different syntactic properties (notice, English has verb-initial declarative clauses but only when the verb is one of the auxiliaries) is not so common, and clearly adds complexity.

2.5.3 Interrogative and Relative Words

Now briefly consider the so-called wh-words in English. These have four primary uses: they appear in open interrogatives (like the independent clause What have they done to the PA system? or the subordinate clause following wonder in We wondered what they had done to the PA system), integrated (“restrictive”) relative clauses (as in the NP the person who is in charge of the PA system), supplementary (“nonrestrictive”) relative clauses (as in the expression Pat Riley, who is in charge of the PA system), and fused relative constructions (like the NP what little sound the PA was able to produce). But the pattern is mysteriously irregular. I will give just a few examples of the irregularity.

Who appears freely in three of the constructions, but is archaic or rare in fused relatives: we find Who steals my purse steals trash in Shakespeare, and there seems to be an unusual survival in the coffee-shop locution Can I help who’s next?, but in general expressions like ?*You should talk to who proposed it are not used (the analog with whoever is used instead).

Whom, the accusative form of the who lexeme, is in common use in both kinds of relative clause (the person to whom it was addressed), but sounds unbearably pompous in open interrogatives (nobody normally says ??Whom should I see about returning this?—the frequency of clause-initial whom in conversation is approximately zero), and it is impossible in a fused relative (*Whom they hired failed to complete the job).

Whose with human reference is used in interrogatives and relative clauses but not fused relatives (*Whose dog attacked my child will be hearing from my attorneys).

Whose with nonhuman reference is different: it occurs in relative clauses (Oil the doors whose hinges squeak) but not in interrogatives (I’ve got the oil can. *Whose hinges squeak?)

What is standard for open interrogatives asking questions about nonhumans (What are you looking at?) but is strikingly and stereotypically ungrammatical (a hallmark of certain nonstandard dialects) in both integrated relative clauses (*the car what you were looking at) and supplementary ones (*It was made of plastic, what really annoys me), though it is fine in fused relative NPs (What attacked the child must have been some kind of carnivorous animal).

Which is freely usable in interrogative and relative clauses, but not in fused relatives (*Which attacked the child must have been some kind of carnivorous animal).

Where occurs in all four types of construction.

When, by contrast, seems not to be usable in fused relatives (*When major sports events are shown is expensive for purchasing TV advertising time).

How is perfect in interrogatives (How is this done?) but ungrammatical in standard English (again, stereotypically nonstandard) in relative clauses (*Let me demonstrate the way how it’s done), and in fused relatives (*How they make that stuff is illegal in this county).

The point to be made about all of this is that English does not have anything like a simple principle that its wh-words are used to introduce both open interrogatives and relative clauses. Instead, we find an intricate pattern of permissible and impermissible uses of different wh-words in different types of construction. This is a further example of the way English grammar conceals its complexity not in the endings of inflected words but in apparently random irregularities of co-occurrence in the most basic and familiar syntactic constructions. There are some broad regularities, but a closer look reveals a ragged pattern with apparently random gaps.

It would have been reasonable, logical, and economical for all wh-words to be uniformly available in all the relevant constructions. It would also have been very reasonable to have a distinct series of words for each different construction (which is something like what we find in Hindi (a distant relative of English genetically), where interrogative words begin with k, relative words begin with j, demonstrative words begin with t, and so on. But English followed a different path. Some uses of wh-words have caught on in particular constructions and others have not.

This is not, of course, an argument that accurate description of the resultant system is impossible. We can analyze the properties that interrogative and relative constructions share, abstract away from all details of wh-word idiosyncrasies as necessary for a particular explanatory purpose. But the gaps and inconsistences exemplified here are irreducible syntactic irregularities, which have to be learned somehow by every native speaker even though they do not hold universally but are clearly parochial. Indeed, sometimes nonstandard dialects differ from the standard dialect in the relevant respect, as previously shown.

3. Conclusion

This brief article on what is probably the most intensively examined language in human history has obviously not provided, and could not possibly have provided, a systematic survey of the interesting features of English as contrasted with other languages. What it has tried to do is just to highlight one rather iconoclastic point, namely that English could never have been convincingly claimed to deserve its unprecedented global standing on grounds that it was the most suitable language to pick. English has a large and rather unwieldy multilayered vocabulary; it presents a variety of phonetic problems that make it difficult for speakers of other languages to master a good pronunciation; its morphology and phonology are more complex than they are reputed to be; and it has a syntax that should never be represented as following naturally from simple logical principles, in the way many misguided usage writers have suggested. It is a tricky, irregular, and complex language in a plethora of ways. The fact that it has spread around the world and is spreading further suggests that many hundreds of millions of people have spent many hundreds of billions of hours struggling to master it. Native speakers of English, who inherit such a useful ability to get around the world without significant communication difficulties, owe them a debt of gratitude.

