Case Markers in Indo-Aryanfree

  • Miriam ButtMiriam ButtUniversity of Konstanz


Indo-Aryan languages have the longest documented historical record, with the earliest attested texts going back to 1900 bce. Old Indo-Aryan (Vedic, Sanskrit) had an inflectional case-marking system where nominatives functioned as subjects. Objects could be realized via several different case markers (depending on semantic and structural factors), but not the nominative. This inflectional system was lost over the course of several centuries during Middle Indo-Aryan, resulting in just a nominative–oblique inflectional distinction. The New Indo-Aryan languages innovated case markers and developed new case-marking systems. Like in Old Indo-Aryan, case is systematically used to express semantic differences via differential object marking constructions. However, unlike in Old Indo-Aryan, many of the New Indo-Aryan languages are ergative and all allow for non-nominative subjects, most prominently for experiencer subjects. Objects, on the other hand, can now also be unmarked (nominative), usually participating in differential object marking. The case-marking patterns within New Indo-Aryan and across time have given rise to a number of debates and analyses. The most prominent of these include issues of case alignment and language change, the distribution of ergative vs. accusative vs. nominative case, and discussions of markedness and differential case marking.


1. Introduction

1.1 Indo-Aryan Languages

Indo-Aryan languages are mainly spoken in South Asia, but also across the world due to the South Asian diaspora. They are a branch of Indo-Iranian, which in turn is a branch of Indo-European. The written record for Indo-Aryan goes back to about 1900 bce and is the longest documented continuous written record available for historical linguistics; see Table 1, from Butt and Deo (2017, p. 530).1

Table 1. Indo-Aryan Chronology




1900 bce – 1100 bce


Early Old Indo-Aryan

1000 bce – 200 bce


Later Old Indo-Aryan

300 bce – 700 ce


Middle Indo-Aryan

1100 ce – present


New Indo-Aryan

Modern Indo-Aryan languages number in the hundreds. Major languages like Bengali, Gujarati, Kashmiri, Marathi, Punjabi, Saraiki, Sindhi, Sinhala, or Urdu/Hindi are each spoken by millions of people and have strong and rich literary traditions. Hindi and Urdu, which are structurally almost identical and will be referred to as Urdu/Hindi throughout this paper, number among the top five spoken languages in the world, with an estimated total of 708 million speakers. Bengali, Punjabi, and Marathi number among the 20 most frequently spoken languages in the world.2 These numbers stand in stark contrast to the amount of work done on these languages within linguistics: Most of the languages other than Urdu/Hindi have less than a handful of researchers working on them actively.

One effect of the dearth of researchers is that comparatively little information about the structure of Indo-Aryan languages has been taken into account in typological or formal linguistic work since the beginning of the 20th century.3 Yet information from Indo-Aryan languages is crucial for an understanding of the larger typological and comparative linguistic patterns, particularly with respect to case marking.

1.2 Indo-Aryan Case-Marking Patterns

Case marking patterns in Indo-Aryan are complex. Morphosyntactic, lexical semantic, clausal semantic, and pragmatic factors interact with one another in a nontrivial manner in the determination of case marking. Some Indo-Aryan languages have an ergative case, others do not. The crosslinguistic patterns show interesting microvariation and buck expectations as to, for example, the structure of ergative vs. accusative languages, markedness, or differential case marking (DCM).

Most Indo-Aryan languages do not comply neatly with classic ideas about case alignment across different types of clauses as described, for example, in Dixon (1994) and Blake (2001). However, the typological database available in the World Atlas of Language Structures Online (WALS) (Dryer & Haspelmath, 2013) contains just two data points from Indo-Aryan languages among a total of 190 languages in the chapter on case alignment. These two languages are Hindi and Marathi. Under the WALS classification they are both recorded as being tripartite, whereby the A (agentive subject of transitive sentences), S (subject of intransitive sentences), and O (object) can all be marked differently. This data set does not reflect those Indo-Aryan languages which have no ergative case and it also does not reflect the more nuanced differences among Indo-Aryan languages, as documented for instance in the comparative overview provided by Deo and Sharma (2006). In particular, Hindi and Marathi differ in that all pronouns are marked with the ergative in Hindi whereas only third-person pronouns are marked in Marathi, a distinction that has traditionally been taken to be important in discussions of ergativity, cf. Silverstein (1976).

As another example, take the most recent formal linguistic book on case: Baker (2015). This book contains discussion of exactly one Indo-Aryan language: Urdu/Hindi. Toward the end of the book there is a note that says: “I also admit that I do not have a full analysis of ergative or accusative in Hindi-Urdu ...” (Baker, 2015, p. 291, fn. 5). The problem is that the case-marking patterns in Urdu/Hindi go against expectations of the theory of dependent case argued for by Baker, by which one or another of the two core arguments of a clause should be marked, but not both at the same time (also assumed by theories of markedness). This and other issues are discussed in more detail in this contribution with respect to some of the known patterns in Indo-Aryan.

However, this contribution will not be able to do full justice to the wide variety of data found in Indo-Aryan because there is actually a wealth of data that can be tapped into and that is awaiting thorough analysis. The monumental Linguistic Survey of India (Grierson, 1903–1928) has been followed up by the 2011 Linguistic Survey of India. Online resources are increasingly being built, for example the very comprehensive Digital South Asia Library or the new initiative of The World Atlas of Transitivity Pairs, and more and more synchronic and diachronic corpora are becoming available.

Full-length grammars exist for several Indo-Aryan languages, some of them written as far back as the 1700s. Indeed, given the long attested historical record for Indo-Aryan and the generally very high quality of the grammars and discussions of the last three centuries, much more work could be done within Indo-Aryan historical linguistics.

There are a handful of researchers who do work within Indo-Aryan historical linguistics, and recent years have witnessed a small but growing body of work, particularly with respect to the topics of case, ergativity, and oblique/non-nominative subjects. Before delving into these issues into more detail below, it should be mentioned that while this paper is on case in Indo-Aryan, any research on Indo-Aryan languages must necessarily take into account patterns found in the genetically unrelated but geographically adjacent Dravidian, Iranian, Munda, and Tibeto-Burman languages with which the Indo-Aryan languages have been in a situation of language contact for several millennia. The languages of South Asia form a linguistic area (Ebert, 2001; Masica, 1976) in the sense that while the Indo-Aryan languages display interesting patterns of microvariation (also see Peterson, 2017), generalizations at a macro level can be made across South Asian languages (Masica, 1991). With a few exceptions (Kashmiri being a prominent example), South Asian languages generally display SOV word order and function very similarly with respect to patterns of case marking. For example, there is a general tendency to employ DCM. In DCM, clauses differ only with respect to a case marker and thereby express a different semantic import. Case tends to be governed by a combination of structural and lexical semantic factors. Clausal semantics, particularly with respect to the expression of modality, are also involved. None of the languages contain a ‘have’ verb,4 and possession tends to be indicated via genitive or dative/locative constructions.

This section has sought to highlight some of the main issues and areas involved in the study of case in Indo-Aryan. The following sections provide more detailed information on the history of case in Indo-Aryan and concomitant issues of language change, morphosyntactic encoding, case and agreement, markedness, DCM, and the use of case to express semantic distinctions.

2. Indo-Aryan Case Across Time

The history of Indo-Aryan case raises many interesting theoretical and empirical questions. Old Indo-Aryan (OIA) is generally divided into an earlier and a later stage. The earlier stage is Vedic, the language of the Vedas, the oldest attested Hindu writings. The later stage encompasses Sanskrit, which was a high literary language, and various Prakrits, which represented more vernacular uses and gave rise to Middle Indo-Aryan (MIA). As in all language/dialect situations, the boundaries are fluid.

Sanskrit and Vedic had an inflectional case system that functioned much like that of their sister language Latin. This inflectional case system was lost over the period of several centuries in MIA, but a new system arose as part of New Indo-Aryan (NIA). In NIA the case system is mainly realized by adpositions or clitics. This historical change has raised questions about the notion of “case” itself, with one perspective asserting that case should be narrowly defined and only pertain to inflectional morphology. Another perspective takes a more functional approach and abstracts away from the precise morphophonological realization, concentrating on the overall function of the individual items. See Section 2.3.2 for more details.

Another topic that has been the focus of linguistic research is that OIA had no ergative case, but many of the NIA languages do. This led to Indo-Aryan as being cited as a text-book case for a shift from a nominative-accusative to an ergative-absolutive system (Dixon, 1994; Harris & Campbell, 1995). However, the data is actually much more complex and different alternative perspectives have been advanced. This topic is addressed in Section 3 and is connected to the precise origin of the NIA case system of adpositions and clitics, briefly addressed as part of Section 2.3, which provides a sketch of the case system in NIA languages as compared to OIA (Section 2.1) and MIA (Section 2.2).

A further connected topic concerns non-nominative or oblique subjects. OIA contains next to no evidence of non-nominative subjects, but they abound in NIA languages. The historical aspects of this are addressed briefly in Section 6, but the bulk of the discussion of the theoretical implications and analyses that have been proposed so far is contained in the sections focusing on synchronic structural aspects of Indo-Aryan case.

2.1 Old Indo-Aryan

Table 2 illustrates the inflectional case system of Sanskrit (Blake, 2001, p. 64). For the sake of simplicity, it shows just one noun class. As in Latin, the surface shape of the inflectional affixes depends on the noun class and on morphophonological processes. There are seven cases, which are presented in Table 2 according to the numbering system used in Panini’s grammar of Sanskrit. This grammar is dated to approximately the 6th century bce (Böhtlingk, 1839–1840/1998; Katre, 1987/1989). For ease of exposition, a column lists the Western (originally Roman) case name that corresponds most closely to the Pāṇinian classification.

As far as I am aware, there is no comprehensive analysis of the Sanskrit case system besides Pāṇini’s original grammar. In modern linguistic terms, Sanskrit is classified as an accusative language. That is, the default case is nominative (case 1) for subjects and accusative (case 2) for objects, as shown in (1). Indirect objects are marked with the dative.

Table 2. Declension of Sanskrit Deva- ‘God’



Western Name























However, as in most languages, the complete picture is more complex. Pāṇini did not operate with notions such as subject or object or structural terms like specifier and complement. The basis for case marking in Sanskrit is taken to be governed primarily through semantic or kāraka roles.5 (2) shows some kāraka roles along with their prototypical definitions and corresponding thematic role names.


Pāṇini’s grammar consists of 4000 interdependent rules, ordered mostly by the type of topic addressed and the level of generality. Pāṇini is famous for operating with default rules and underspecification, a formal device that has entered modern linguistics as the Elsewhere Condition (Kiparsky, 1973).6

The rules which define the semantic roles associated with differing verbs and verb classes interact with a set of further rules which govern the overt realization of case. Again, there is a set of default rules which apply generally unless they are overridden by more specific rules occurring later on in the grammar. An example is provided by (3) and (4), whereby (3) is the default rule for accusative. This default rule is overridden by (4) for the case of the verb ‘sacrifice’, which requires an instrumental instead.7



The overall effect of this interacting system of rules is that there is no simple one-to-one correspondence between semantic roles and case marking. The following picture (5), based on Itkonen (1991, p. 49), illustrates the case-assignment possibilities for the two central semantic roles. The instrumental is taken to be the default realization of agent, the accusative the default of the patient. The agent can also be realized as nominative or genitive, depending on the effects of additional verbal morphology or lexical specifications. The patient can also be realized as dative, nominative, or genitive, again depending on additional verbal morphology, lexical specification, or further semantic factors.


As an illustration, consider the rule in (6), which states that both dative and accusative can be used for the goal of the movement. However, the two cases distinguish between abstract vs. concrete goals.


Indeed, this type of case alternation is not the only instance found in OIA, as the examples in (7) show. This alternation between the accusative and the genitive expresses the so-called partitive alternation, by which a distinction is made between a quantified vs. an unquantified amount of liquid specified by the object.


There is, of course, much more to say about case in OIA. However, the material in this section should suffice to convey the following points. OIA had an inflectional case system. This case system was complex and sensitive to lexical specifications and semantic aspects, but mainly followed a nominative-accusative pattern for main clauses with simple verb morphology. Despite the fact that the agent can be realized by several different case markers, there is little evidence for non-nominative subjects. This issue is taken up in Section 6.

2.2 Middle Indo-Aryan

The inflectional case system of OIA eroded over time. Morphophonological change led to the collapse of distinctions and a heavily syncretic system. In particular, there was a loss of morphological contrast between nominative and accusative as well as between genitive and dative. Table 3 provides a flavor of the changes.8

Table 3. Syncretized Case Paradigm in MIA




−u, a, aṁ

−a, aĩ


−eṁ, iṁ, he, hi

−e(h) ĩ, ehi, ahĩ


−hu, ahu, aho

−hũ, ahũ


−ho, aho, ha, su, ssu

−na, hã


−i, hi, hiṁ


Source: Masica (1991, p. 231).

During MIA the inflectionally rich tense/aspect system of OIA also underwent significant changes (Pischel, 1900). In a fairly typical instance of language change, the inflectional forms were lost and a periphrastic system involving former participles in conjunction with auxiliaries was developed; see Deo (2006) for a detailed recent discussion.

This fundamental change in the tense/aspect system had far-reaching consequences for the case system. A development toward non-nominative subjects was triggered because the original inflectional morphology was being lost and the necessary temporal and aspectual contrasts were instead being expressed via participial adjectival and nominal forms, which served to “demote” the nominative subject so that it was (mostly) realized as an unmarked object.

With respect to the ergative, it was particularly the loss of inflectional past referring forms such as the aorist, the inflectional perfect, and the imperfect that was significant. The adjectival passive participle -ta instead took over as the only past referring device. There is evidence that this -ta participle was already being used to described past, culminated events in Sanskrit (see, among others, Bynon, 2005; Deo, 2006; Kiparsky, 1998). (8) provides an illustration from Epic Sanskrit.


Given that the -ta forms a participle, the agent is realized as an instrumental, rather than as a nominative, as it would have been if the verb were active and inflected for tense, as the following contrast shows.


With unaccusative verbs as in (10), in contrast, the “highest” argument is/remains nominative. This sensitivity toward lexical semantics in the realization of the subject is due to the fact that the -ta participle had the semantic effect of predicating a result state of an affected argument. As is generally the case for resultatives, for transitive (agentive) verbs, a result state can only be predicated of a theme/patient argument, but not an agent. The main argument of the -ta participle was thus always the nominative, whether it be the theme/patient (‘Brahman’) in an agent–patient constellation as in (9) or the theme/patient (‘I’) of an unaccusative as in (10).


The resultative semantics of -ta ultimately generate an ergative alignment pattern by which objects of transitive verbs and subjects of intransitive (unaccusative) verbs are marked identically. The examples in (11) from an archaic MIA Mahāraṣṭrī text Vasudevahiṃḍī (ca. 500 ce) illustrate this pattern. The king in (11a) is the subject of an unaccusative and is marked nominative. In the transitive (11b) the object ‘well’ is also nominative.


Indeed, both case marking and verb agreement show ergative alignment. The verb agrees with the unaccusative nominative subject in (11a). In (11b), the verb agrees with the nominative object (‘well’). It does not agree with the agentive instrument subject (‘that running one’). Thus, in classic alignment terms (e.g., see Plank, 1979), which uses the terms A (agent), S (subject of intransitives), and O (object), the S and the O thus cluster together, while the A stands apart in terms of both verb agreement (verb does not agree with the A) and case marking (A is marked instrumental).9

MIA is thus generally seen as having shifted from stative participial constructions to active ergative clauses (Bubenik, 1998; Hock, 1986; Peterson, 1998). However, the picture is more complex.

The passage in (12), taken from the Paumacariu, a Jaina rendition of the epic Rāmāyana, ca. 8th century ce, shows no distinction between subject and object in terms of case marking. Agreement also makes no distinction, as the third singular morphology on the verb could be agreeing with either the subject or the object.


On the other hand, Jamison (2000) describes the intriguing pattern of ergative, perfective clauses in the Niya documents, a collection of texts datable to the third century ce, and so to early MIA. The perfective paradigm based on -ta in this linguistic system innovates a set of new endings (through incorporated auxiliaries) that obligatorily agree in person and number with the clausal subject. In many cases, the subject argument of a transitive perfective clause is nominative, as in (13a). However, in some cases, as in (13b), the subject argument exhibits instrumental/ergative case.


Jamison’s analysis of the Niya documents presents a complex system in which variation between nominative and ergative marking on the subject is sensitive to the animacy of both core arguments (agent and patient). Overt instrumental/ergative marking appears to be “essentially obligatory” when both the agent and the patient are human (Jamison, 2000, p. 73), but may be optional elsewhere. In addition, as shown in (14), there are also attested examples where agentives in nonperfective clauses receive overt marking.


Overall, this picture of an early MIA language points to a system in which agentivity and other semantic factors codetermine the realization of instrumental/ergative case together with factors of structural alignment.

This same combination of semantic and structural conditioning is also found in Aśokan10 inscriptions with -ta participial clauses (Andersen, 1986). Here one sees DCM by which genitive was used for animate agents and the instrumental otherwise.

Jamison (2000) alludes to the lexical semantics of verb classes also playing a role in governing the case marking, but says a more in-depth study is needed. Given that in OIA case marking is sensitive to verb classes and that NIA also shows a correlation between case marking and verb class (e.g., see Khan, 2009; Verma & Mohanan, 1990), it is very likely that Jamison’s hunch that verb classes are playing a role in the distribution of case marking in the Niya documents is correct. It is also known that in MIA experiencer subjects of verbs such as ‘please’, for example, were marked with the dative/genitive (Peterson, 1998, p. 100).

To summarize, MIA saw a massive syncretism of case forms and a reorganization of the tense/aspect system which ultimately licensed the emergence of an ergative case. Structural as well as semantic factors governed the realization of case and even though there was massive case syncretism, MIA had a functioning and complex case system that was governed by an interaction of structural and lexical semantic factors.

2.3 New Indo-Aryan

By NIA, the languages often showed only vestiges of the former morphological case system in that some noun classes retained a nominative/oblique distinction. In Urdu/Hindi, for example, only masculine nouns ending in -a inflect morphologically so that there is a nominative/oblique distinction (laṛka ‘boy’/laṛke ‘boy.Obl, boy.Pl’). As of around 1100 ce, new case markers began to be drawn into the system; most of these came from originally spatial terms such as ‘at, near, by, to’. An exception is the genitive, which appears to be derived from a participial form of ‘do’ and hence also inflects for gender and number. For more detailed information and references, see, for example, Hewson and Bubenik (2006), Butt and Ahmed (2011), and Reinöhl (2016).

Most of the new case markers across the Indo-Aryan languages seem to have originated from a small original set of linguistic material, as Table 4 shows for a selection of languages, where most of the case markers are versions of n-, k- and l- forms. The inflectional forms tend to be vestiges of the old case-marking system. In contrast, the new case markers tend to be clitics. This and other facts about the NIA case systems raise interesting issues, which are briefly addressed in the next sections.11

Table 4. Case Markers Across Indo-Aryan

Dative/ Accusative




































− (e)r





Source: Khan (2009), Masica (1991).

2.3.1 Polysemy of Case Forms

There are homophonies that appear commonly across Indo-Aryan (and also more broadly crosslinguistically). The dative and accusative tend to be expressed by the same form, as are the ergative and instrumental. In addition, as shown in the examples from Haryani and Kherwada Wagdi in (15) and (16), there is also a homophonous relationship between ergative and dative/accusative in some languages.12 Or rather, the case forms are polysemous in that they express more than one case function.



Note that I speak intentionally of homophony and polysemy, but not of syncretism. This is because syncretism describes a falling together of previously distinct forms as part of language change. The NIA clitics are innovations and were innovated with polysemous meaning. There is no historical evidence for the development of previously distinct dative vs. accusative or ergative vs. dative/accusative forms that then fell together as part of language change. Butt and Ahmed (2011) make much of this and propose a lexical semantic perspective on the innovation of case marking, in contrast to the bulk of the literature on case innovation that takes a more structurally oriented perspective.

2.3.2 Clitics and Inflections

As seen (partly) in Table 4, NIA has different morphosyntactic realizations. There are some vestiges of the old inflectional system that continue to play a role in the modern case system. Most of the innovated case forms, however, are clitics. In addition, postpositions are formed via a complex construction that involves a genitive. (17)(19) illustrate each of these instances for Urdu/Hindi. Example (17) is a typical transitive sentence in Urdu/Hindi. The subject is ergative, the object is unmarked. There is an object alternation in Urdu/Hindi by which specific or human objects are marked overtly with the dative/accusative marker ko. Whenever a noun is overtly case marked in Urdu/Hindi, it takes an oblique form. However, this is only visible for a subclass of nouns, that is, masculine nouns ending in −a. As such, the noun ‘yasin’ is invariant, but the noun kutt-a ‘dog’ inflects to reflect nominative (unmarked) vs. oblique morphology.


The oblique morphology can also occur on its own, as shown in (18) for ḍakxan-a ‘post office’ where it expresses a locative meaning.


Example (19) illustrates the use of a postposition, which is composed of the noun in its oblique form, an inflecting genitive marker, and a spatial term that originates from a noun meaning ‘back/behind’.


The descriptive literature thinks of this pattern in terms of case layering (Masica, 1991), whereby the inflectional morphology represents the innermost layer of case marking (Layer 1), the clitics an intermediate layer and the postpositions the outermost layer. The morphosyntactic status of case markers in most Indo-Aryan (IA) languages is unclear as most authors do not make careful distinctions between inflections, clitics, and adpositions, even though these show very different distributional properties. Some exceptions are Friedman (1991) and Butt and King (2004). One conclusion that can be drawn from Urdu/Hindi and the IA patterns more broadly is that they seem to conform quite nicely to Svenonius’s spatial P hypothesis (Svenonius, 2010).

2.3.3 Exponence

A look at the WALS information on case in NIA (Feature 49A, see also Feature 51A) shows that it records just two cases for Urdu and Punjabi. In contrast, Table 4 lists a separate form for dative/accusative, ergative, instrumental, and genitive for each of these languages. The disparity between the WALS information and the information gathered in Table 4 is due to two fundamentally different approaches to case that can be found in the literature. Butt and King (2004), the discussion here, and the bulk of the literature on NIA languages assume what one might call a functionally oriented perspective. That is, case is viewed from a comparative-linguistics perspective and a case marker is defined by the role it plays in marking clause participants. This perspective abstracts away from the precise morphosyntactic realization, that is, in the Urdu/Hindi examples above, both the oblique inflectional −e and the clitics are regarded as case markers.

This stands in contrast to what one might call a strict morphological perspective, which holds that a central, definitional property of case is that it is inflectional. That is, languages like Latin and Sanskrit with their inflectional paradigms (cf. Table 2) are taken to be prototypical case-marking languages. Languages like Urdu/Hindi are considered to only have a two-way case contrast (nominative vs. oblique), with all other markers treated as adpositions of various kinds (clitic status is generally not taken into account). This is the perspective taken by WALS and has been articulated in some detail for Urdu/Hindi by Spencer (2005).

2.4 Summary

This concludes the discussion of issues particular to the individual language stages. The next sections present issues that pertain to understanding language patterns across time.

3. Ergative vs. Accusative Alignment

One major issue that has been discussed with respect to IA is that of accusative vs. ergative alignment. Based on proposals originating from Fillmore’s (1968) Case Grammar, there is an idea in the literature that languages can be classified according to how core grammatical relations are “aligned”. An accusative language is one in which subjects of transitive and intransitive sentences are marked the same way (generally with a nominative) vs. the object (accusative) and are therefore aligned with one another. An ergative language, in contrast, marks the object (O) and the intransitive subject (S) in the same way (usually absolutive/nominative) vs. the transitive subject (A), whose specialized marker is generally called an ergative (see Butt, 2006b for further discussion on alignment typologies and Case Grammar).13

This conception of alignment is mainly structural. Agreement relations have also been thought of as expressing ergative vs. accusative alignment and there is a school of thought that ties case and agreement together very closely as structural features of a clause (Section 4).

IA has been discussed as showing all these patterns, and in particular, as showing evidence for language change in terms of a shift from an accusative to an ergative language (and back to an accusative in some cases). The literature on this topic is comparatively large. This section provides a synopsis of the main lines of thought and relevant references.

Example (20) illustrates the major change that took place: the -ta participle originally took a nominative subject and an instrumental agent; this was reanalyzed as non-nominative, ergative subject with a nominative object.


Besides the ergative marking of the former instrumental, the agreement relations also show ergative alignment, since the verb continues to agree with the nominative, the former subject. This is not evident in (20), but the examples in (21) illustrate this pattern with the first-person-plural pronoun amhẽ. These are again taken from the Paumacariu (ca. 8th century ce). (21a) contains the syncretized pronoun amhẽ, which triggers agreement in the imperfective aspect while the same form fails to trigger agreement in the perfective (21b).


Indo-Aryan (IA) also shows ergative patterns where a dedicated ergative case marker is absent and the ergative alignment is evident only via the agreement patterns. The following is an example from Old Hindi (early New Indo-Aryan, or NIA). The subject is Kabir, which has been glossed as oblique, but which in fact shows no special marking since it is not a noun ending in -a— the only class of nouns in Hindi retaining the vestiges of the former case system. The verbs ‘touch’ and ‘take’ each agree with the feminine object (‘paper’, ‘pen’), but not with the masculine agent Kabir.


Given data like the above, IA has been treated as a textbook case for a shift from a nominative-accusative to an ergative-absolutive system (Dixon, 1994; Harris & Campbell, 1995). However, Butt (2001, 2006a) offers a different perspective, showing that the idealized textbook scenario does not hold up. In dialects of Urdu/Hindi, the dedicated ergative case marker was innovated centuries after the new dative/accusative marker, a fact which is unexpected under the alignment change hypothesis (see Khokhlova, 2016; Stronski, 2014; Wallace, 1982 for further examples and discussion of early NIA dialects). The hypothesis also does not expect polysemy between dative and ergatives (cf. Section 2.3.1). There are clearly subtle semantic factors at play, such as those detailed by Jamison (2000) and Bynon (2005), rather than a structural realignment envisioned by the textbook scenario. In fact, what occurred in the history of IA was a shift by which non-nominative subjects were introduced systematically into the system. Why and how this took place and why IA languages show both interesting common patterns and typological diversity continues to be in need of explanation.14

The discussion around ergativity and ergative alignment is generally conducted without reference to other non-nominative subjects, such as experiencer subjects, which are common in NIA (Verma & Mohanan, 1990) but were completely absent from Old Indo-Aryan (OIA). It is also generally conducted without reference to DCM, which is curious since ergative case is often used to express a semantic distinction in alternation with another case (e.g., dative or nominative). Butt and Ahmed (2011) therefore plead for a perspective on case that integrates a lexical and clausal semantic into the structural ideas on case and case alignment. A slightly different perspective, but working along the same overall lines is provided by Montaut (2003, 2006, 2009, 2013, 2016). For some discussions on NIA languages first innovating an ergative and then losing it, see Khokhlova (1992, 2001).

There are some further issues with respect to case and alignment. One is that a distinction has been made in the past between syntactically accusative/ergative and morphologically accusative/ergative language. In syntactically ergative languages, the grouping of S and O in opposition to A goes beyond surface marking such as case, but has syntactic consequences in terms of, for instance, control. See Dixon (1994) for an in-depth discussion.

As Klaiman (1987) showed, the NIA languages are syntactically accusative (also Dahl & Stroński, 2016; Pandharipande, 1981; Pandharipande & Kachru, 1977). Indeed, the evidence for syntactically ergative languages at this point is basically nonexistent, see Legate (2012, 2014).

Finally, most languages are actually split-ergative and this pertains to NIA as well. One well-known split is according to tense/aspect, with ergatives associated with past/perfect. This is generally true for NIA as well, but there are languages like Nepali (Poudel, 2020) which show ergatives in all tenses and use ergatives to express semantic information such as stage-level vs. individual-level predication (Kratzer, 1995).15 Another well-known split is according to person (Silverstein, 1976) and many NIA languages show this; Punjabi is a prominent example. See Deo and Sharma (2006) and Butt and Deo (2017) for further discussion, and Liljegren (2014) for a recent survey of Dardic (Hindukush) languages.

4. Agreement

Case and agreement are two phenomena that have been tied together very closely, particularly in the theoretical framework of government and binding (GB; Chomsky, 1981) and further developments in that theoretical line, for example, current minimalism. This theoretical framework confusingly distinguishes between Case and case, whereby the uppercase version actually mainly reflects grammatical relations and the lowercase version reflects morphological case. Agreement and Case are seen to work together to determine syntactic structure, and the intertwinement has played a prominent role in discussions of ergativity (e.g., see Bittner & Hale, 1996 as a representative sample).

Neither structural Case nor morphological case show much correlation with agreement in the typological space of NIA languages. Some languages, like Nepali, always show subject agreement, regardless of what case marking the subject has (e.g., nominative, ergative, dative). Some languages, like Urdu/Hindi. only agree with unmarked arguments. Other languages show a more complex picture, with issues of split-ergativity also playing a role. Subbarao (2011) provides an interesting survey of possible patterns in South Asian languages (see also Das, 2006; Verbeke, 2013a). His overall goal is to find a unifying explanation of the variety of agreement patterns surveyed, but he does not ultimately arrive at a satisfactory solution.

In a more recent development within minimalism, Baker (2015) devotes a considerable amount of space toward a careful and detailed argument as to why agreement and case should not be seen as necessarily working hand in hand. While the book does not contain much data from IA languages, if the comparative evidence from these languages were adduced to his argument, it would serve to make it even stronger, see Patel-Grosz (2021) for some recent discussion of Indo-Aryan data exactly along these lines.

5. Dependent Case, Markedness, and Differential Case Marking

Baker (2015) argues for the notion of Dependent Case by which grammatical relations are assigned case marking via a dependency relation with another grammatical relation (Marantz, 2000). In particular, accusative is seen as a dependent case that is assigned when the subject is unmarked (nominative) and ergative is seen as a dependent case that is assigned when the object is unmarked (nominative/absolutive).

It seems intuitively correct that the case marking of one argument in a transitive clause shows sensitivity or dependency on the case marking of the other argument.16 The clause participants do stand in a semantic relationship toward one another (typically, agent and patient) and this needs to be expressed via case marking. However, the Dependent Case theory seeks to codify this relationship as resulting from purely structural configurations. Mahajan (2000, 2017) espouses a related structural perspective on case and has been working to fit the Urdu/Hindi pattern into this overall view.

The theory of Dependent Case can be seen as a more specialized variant of more general notions of markedness. Malchukov and de Swart (2009) and de Hoop (2009) provide comprehensive surveys of the state of the art with respect to case and markedness. In particular, they tie notions of markedness in with the appearance of DCM and seek to address the overall question of why some languages exhibit accusative systems vs. ergative systems vs. mixed systems. The literature on DCM and markedness suggests that the emergence of the various observed patterns is rooted in the maximization of distinctions between the (two) core arguments in a clause. Generally, one argument should be marked overtly to distinguish it from the other, which is unmarked. This is much like the core notion propagated by Dependent Case.

Differential subject marking (DSM) in particular is taken to imply the presence of two different strategies for argument marking. These may sometimes conflict with one another, leading to mixed patterns.


Distinguishing strategy: in order to distinguish subjects from objects, mark nonprototypical subjects (i.e., subjects which could be mistaken for objects).


Indexing strategy: Identify prototypical subjects (agents) and mark this particular semantic role.

New case markers are predicted to arise first in situations where it is difficult to distinguish agents/subjects from patients/objects, that is, in marked situations. Typical agents/subjects are taken to be animate, agentive (transitive), and/or topical. Typical patients/objects are inanimate and/or indefinite.

Ergative languages are taken to be one typical result of these two basic strategies. Another prediction by the distribution of case marking according to markedness is the creation of differential object marking (DOM) systems.

The scenario here is that objects which could be mistaken for subjects are overtly marked via the distinguishing strategy. Consider (23 and 24), which shows that animate objects in Urdu/Hindi must be marked overtly, whereas inanimates show DOM by which the overtly case marked version denotes specific objects (Butt, 1993; Dayal, 2011). The markedness hypothesis for DOM articulated by Malchukov and de Swart (2009) and de Hoop (2009) sees the marking of animate objects as being extended over time to a general definiteness/specificity marking of objects.



There are several further predictions that this perspective on DCM brings with it. For one, given that two, sometimes conflicting strategies are taken to be active in DSM vs. the one distinguishing strategy for DOM, the expectation is that a greater number and variety of types of DCM should be found in DSM situations than in DOM situations. Additionally, more DSM is expected in ergative languages while more DOM is expected in accusative languages.

These are interesting predictions which should be tested for a wide range of languages. With respect to IA, however, the predictions do not seem to hold up. While a more thorough investigation remains to be undertaken, the available evidence so far suggests that IA languages show both DSM and DOM and that both types seem to vary equally. There are both ergative and accusative languages within IA and both types seem to have equal amounts of DSM and DOM.

This section briefly presents some of the unexpected patterns. Consider, for example, (25) and (26). An ergative marks the subject, but additionally the object also carries an overt case marker. For purposes of indexing or distinction of arguments, having two overt case markers is a situation of overkill.

Furthermore, as already seen above and repeated in (25) and (26), one finds examples in NIA in which the same case form marks both subject and object. From the perspective of indexing and distinction of arguments, this is again unexpected as disambiguation of arguments is not being achieved when both arguments carry the same case forms.



DOM is usually considered to be asymmetric such that there is an alternation between an unmarked and an overtly marked form (as in (24)). However, one finds patterns such as the Urdu/Hindi in (27) quite regularly across South Asian languages (e.g. n, see Khan, 2009), generally accompanied by a difference in semantics/pragmatics. Example (27) features an alternation between ko and se, whereby the se indicates a more indirect type of interaction. Another example is furnished by (27), where the DCM uses two different ablatives to distinguish between dynamic vs. static paths.



Recall that more DSM is expected in ergative languages while more DOM is expected in accusative languages. South Asian languages overall include both ergative and accusative types, but the possibilities for DOM and DSM seem equal. For example, no Dravidian language is ergative, but they exhibit DSM. In particular, DSM seems to be used for an expression of modality, as illustrated in (29) for Urdu,17 and in (30) and (31) for Malayalam, a Dravidian language (Butt et al., 2004). Indeed, Montaut (2016) considers the expression of modality to be a crucial ingredient for the development of ergative case markers in Indo-Aryan.




Also consider the Bengali alternation in (31) and the use of the ergative to express stage- vs. individual-level predication in Nepali illustrated in (33) and (34) (Poudel, 2020).




In sum, Indo-Aryan contains DCM systematically. The patterns are not as simple as expected under a markedness/indexing view and they include semantic factors that are not considered relevant from the Dependent Case perspective, which is interested only in explaining the core structural properties of case marking (but see Næss, 2006, who recasts the issue in terms of semantic factors in transitive clauses). In IA languages, structural, morphological, and semantic conditions on case marking appear to be heavily intertwined, casting doubt on whether a concentration on core structural properties can really lead to an understanding of the individual languages’ case-marking systems, and therefore, by extension, to a deep comparative linguistic understanding.

Consider the DCM between instrumental and accusative marking found in Sanskrit as illustrated in (35) (taken from Speijer, 1886/1973, §49) and a parallel in Urdu/Hindi, shown in (36). The alternation involves causees, which are arguably a centrally structural part of a causative clause (e.g., see Butt, 1998, and Ramchand, 2008 for analyses in different frameworks working on this assumption), but clearly also have a systematic semantic dimension (Saksena, 1980). In the alternation below, the accusative conveys that the causee is also affected in some sense by the caused event, that is, it is also a patient. The systems of the two language stages are remarkably similar, despite the fact that massive changes occurred over time in the grammatical and inflectional system—a fact that has so far not been accounted for satisfactorily.



Recent typologically informed work by Dalrymple and Nikolaeva (2011) also points toward a close connection between meaning and structure with respect to case marking. Dalrymple and Nikolaeva (2011) invoke pragmatic forces as the primary driver for the innovation of object marking and DOM. In particular, the Urdu/Hindi specificity object alternation is seen as being the result of initial secondary topic marking, in line with a crosslinguistic pattern found across a wide array of languages. Information structure has also been established to play a role in case marking in the broader South Asian context, see for example, Hyslop (2010) for Tibeto-Burman. As already mentioned, South Asian languages do function broadly similarly, and evidence from other language families in the area does also need to be factored in for a broader understanding of IA case systems.

6. Non-Nominative Subjects

Given that several different types of language families have been in contact with one another over the course of several millennia, language contact has played a significant role in the current structure of the languages (cf. Hock & Bashir, 2016). One area where it appears to be centrally relevant is with respect to experiencer subjects (Verma & Mohanan, 1990). While some IA languages have an ergative subject and others do not (e.g., Bengali), all IA languages appear to have experiencer subjects. The examples in (37) are from Marathi.


Experiencer subjects are generally marked with a dative or genitive case and are, beside ergative subjects, just one instance of non-nominative subjects in IA (Bhaskararao & Subbarao, 2004). The examples in (38) show the possible range identified for Urdu/Hindi (Mohanan, 1994).


The appearance of non-nominative subjects in New Indo-Aryan (NIA) languages stands in stark contrast to Old Indo-Aryan (OIA), which shows little-to-no evidence for non-nominative subjects. Hock (1990, 1991) does conclude that examples such as (39) can be considered to contain a non-nominative subject. This type of genitive possessor is also widely found in NIA; see (38e) for a parallel example from Urdu/Hindi.


Middle-Indo Aryan (MIA) also showed case alternations with constructions that would involve subjects in NIA. One example is the instrumental/nominative alternation found in the Niya documents discussed above. Another is an alternation between genitives and instrumentals documented by Andersen (1986) in Asokan inscriptions for the agent of the ta participle. The instrumental was used as a default while the genitive was restricted to animate agents.

7. Further Issues

There are some further issues that have not been discussed as yet. One is the relationship between case marking and the ability to control into nonfinite clauses; see Davison (1985, 2008) for some discussion. Another is the relationship between case and classifier systems. Classifiers are found on noun phrases mainly in the Eastern IA languages (e.g., Bengali) and it is likely that these are due to language contact with the neighboring Tibeto-Burman languages, but the effect or interaction with case systems remains to be explored.

8. In Sum

This contribution has provided an overview over the major issues discussed to date with respect to case in IA languages. It has also attempted to provide a flavor of the type of case-marking patterns typically found in IA and to convince the reader that a complex interplay of factors is involved in these, on which much work remains to be done.


I have written on synchronic and diachronic issues with respect to case for a number of years and with a number of colleagues. Of these, I would particularly like to mention Tracy Holloway King and Ashwini Deo.

Further Reading

This section lists papers in alphabetical order that provide further overviews or central discussions of the core phenomena surveyed in this contribution.

  • Bhaskararao, P., & Subbarao, K. (Eds.). (2004). Non-nominative subjects. Amsterdam, the Netherlands: John Benjamins.
  • Butt, M. (2001). A reexamination of the accusative to ergative shift in Indo-Aryan. In M. Butt & T. H. King (Eds.), Time over matter: Diachronic perspectives on morphosyntax (pp. 105–141). Stanford, CA: CSLI Publications.
  • Butt, M. (2006). Theories of case. Cambridge, UK: Cambridge University Press.
  • Butt, M. (2017). Hindi/Urdu and related languages. In J. Coon, D. Massam, & L. deMena Travis (Eds.), The Oxford handbook of ergativity (pp. 807–831). Oxford, UK: Oxford University Press.
  • Butt, M., & Deo, A. (2017). Developments into and out of ergativity: Indo-Aryan diachrony. In J. Coon, D. Massam, & L. de Mena Travis (Eds.), The Oxford handbook of ergativity (pp. 530–552). Oxford, UK: Oxford University Press.
  • Butt, M., & King, T. H. (2004). The status of case. In V. Dayal & A. Mahajan (Eds.), Clause structure in South Asian languages (pp. 153–198). Berlin, Germany: Kluwer Academic Publishers.
  • Dahl, E., & Stronski, K. (Eds.). (2016). Indo-Aryan ergativity in typological and diachronic perspective. Amsterdam, the Netherlands: John Benjamins.
  • Deo, A., & Sharma, D. (2006). Typological variation in the ergative morphology of Indo-Aryan languages. Linguistic Typology, 10(3), 369–418.
  • Hewson, J., & Bubenik, V. (2006). From case to adposition: The development of configurational syntax in Indo-European Languages. Amsterdam, the Netherlands: John Benjamins.
  • Hock, H. H., & Bashir, E. (Eds.). (2016). The languages and linguistics of South Asia: A comprehensive guide. Berlin, Germany: Mouton de Gruyter.
  • Klaiman, M. H. (1987). Mechanisms of ergativity in South Asia. Lingua, 71, 61–102.
  • Mahajan, A. (2017). Accusative and ergative in Hindi. In J. Coon, D. Massam, & L. deMena Travis (Eds.), The Oxford handbook of ergativity (pp. 86–108). Oxford, UK: Oxford University Press.
  • Masica, C. (1991). The Indo-Aryan languages. Cambridge, UK: Cambridge University Press.
  • Patel-Grosz, P. (2021). Ergativity in Indo-Aryan. In Oxford Research Encyclopedia of Linguistics.
  • Subbarao, K. (2012). South Asian languages: A syntactic typology. Cambridge, UK: Cambridge University Press.
  • Verma, M., & Mohanan, K. (Eds.). (1990). Experiencer subjects in South Asian languages. Stanford, CA: CSLI Publications.


  • 1. Some of the material presented here is based on close cooperations with Tracy Holloway King and Ashwini Deo them and has appeared in joint publications over the years (Butt & Deo, 2001, 2017; Butt & King, 1991, 2004).

  • 2. Ethnologue.

  • 3. This contrasts with the 19th and the late 18th centuries, which saw the rise of the Neogrammarian school and the development of comparative linguistics. Both were fueled substantially by the new discoveries related to Sanskrit and other languages found in the subcontinent such as Persian, leading to the discovery of the Indo-European language family.

  • 4. This is irrespective of whether they make use of an ergative or not, a point worth making as connections between an overt ergative case and the absence of a ‘have’ verb have been drawn, cf. Mahajan (1994, 1997, 2004).

  • 5. Much of this discussion is based on Butt (2006b).

  • 6. For more details, see also the online browsable version of Paṇini’s grammar.

  • 7. Translation/interpretation of rules taken from Katre (1987/1989).

  • 8. The discussion in this section is based primarily on Butt and Deo (2017).

  • 9. Evidence that the agentive argument is indeed the subject of the clause in (11b) comes, for example, from control of gerundial clauses.

  • 10. Asoka was an Indian emperor who ruled from about 268 to 232 bce.

  • 11. A side note on Romani is in order here. Romani is an Indo-Aryan language spoken in Europe. Due to massive language contact with European languages over the space of a millennium, it shows some properties which are atypical of Indo-Aryan languages and which interact with case marking. It contains prepositions rather than postpositions and has developed a determiner system (Elšík & Matras, 2000; Friedman, 1991). As Bubenik (2000) shows, the mainly inflectional case system of Romani is close to MIA and finds the greatest parallels with Sindhi or Kashmiri. Like in other Indo-Aryan languages, the genitive has been innovated, probably from a participial form of the original Sanskrit verbal root kṛ ‘do’ (Friedman, 1991; Koptjevskaja-Tamm, 2000).

  • 12. In Kherwarda Wagdi, ne and ne are allophonic variants of one another (Phillips, 2013).

  • 13. The WALS perspective can be found by looking at feature 98A for case alignment.

  • 14. Other Indo-European-language branches such as Romance and Germanic also lost their original case system and showed a reorganization of their tense/aspect system with the drawing in of participial forms as the original inflectional system was lost. However, these branches did not innovate new case forms en masse, and they generally do not allow for non-nominative subjects, whereby Icelandic is a well-known exception (Zaenen et al., 1985). Instead, they innovated a determiner system (Hewson & Bubenik, 2006).

  • 15. See also Li (2007) and Verbeke (2013b) for a different take.

  • 16. For a sophisticated typological investigation of alignment effects and co-argument sensitivity, see Witzlack-Makarevich et al. (2016).

  • 17. Note that in this construction the ergative appears without attendant perfective morphology.