Big Data and Communication Research
Abstract and Keywords
Communication research has recently had an influx of groundbreaking findings based on big data. Examples include not only analyses of Twitter, Wikipedia, and Facebook, but also of search engine and smartphone uses. These can be put together under the label “digital media.” This article reviews some of the main findings of this research, emphasizing how big data findings contribute to existing theories and findings in communication research, which have so far been lacking. To do this, an analytical framework will be developed concerning the sources of digital data and how they relate to the pertinent media. This framework shows how data sources support making statements about the relation between digital media and social change. It is also possible to distinguish between a number of subfields that big data studies contribute to, including political communication, social network analysis, and mobile communication.
One of the major challenges is that most of this research does not fall into the two main traditions in the study of communication, mass and interpersonal communication. This is readily apparent for media like Twitter and Facebook, where messages are often distributed in groups rather than broadcast or shared between only two people. This challenge also applies, for example, to the use of search engines, where the technology can tailor results to particular users or groups (this has been labeled the “filter bubble” effect). The framework is used to locate and integrate big data findings in the landscape of communication research, and thus to provide a guide to this emerging area.
Communication research has recently seen a large number of publications with groundbreaking findings based on big data. Examples include analyses of microblogging services (Twitter), online information sources (Wikipedia), social network sites (Facebook), search engine behavior (Google), and smartphone uses. The main reason why big data research has become so prominent is that new sources of digital data have become available that were not previously accessible to researchers. At the same time, the question of just how accessible these sources are, since most of these data come from commercial companies, is perhaps the single most important challenge to this otherwise burgeoning of research. The second important challenge, but one which may be resolved in due course, is that most of this research does not fall into the two main traditions in the study of communication, mass and interpersonal communication. That is because digital media are often in between the two, as when Facebook users share news among their groups of friends, or when Twitter hashtags are created for particular events so that they create an audience around the event rather than, again, being part of one-to-one or broadcast communication.
This second challenge will be a theme below, so it is worth spelling out further at the start: on Twitter and Facebook, people look at links based on what their friends or followers post or on their feeds, which are based partly on the people in their networks. This means that even if the link comes from a traditional news source, it is often shared among groups rather than being broadcast or exchanged between only two people. The same applies to search engines, which can be seen as gatekeepers inasmuch as they tailor content to particular audiences (Pariser, 2011), even if, unlike traditional mass media, they do not themselves produce content. Another example is Wikipedia, which, despite being one of the most popular websites, does not easily fall into existing categories of communication or of existing sources of information such as those produced by academic researchers or accredited professionals. The lack of a model or theory for digital or social media represents a problem for communication research generally, but makes it particularly difficult to fit big data findings into existing traditions of communications and media.
Some of the main findings in this new area of research are reviewed first. Particular emphasis is placed on how big data findings contribute to particular theories or research agendas in communication studies. It covers, in turn, research about information seeking (Wikipedia), social network sites (Facebook), the Web as a source of online information, microblogging (Twitter), search engines, and mobile phones. The discussion of these data sources will touch on a number of communication subfields to which big data studies contribute, including political communication and mobile communication. Finally, there are pointers to future directions in this area, leading to the question of how these new findings will contribute to communications research. More broadly, the conclusion also puts big data in a wider perspective: how are big data changing not just communications research, but also the role of media in society at large?
This overview will necessarily be selective: there are too many studies in this rapidly growing area for a comprehensive review (but see Ekbia et al., 2015; Golder & Macy, 2014; and the “Discussion of the Literature” and “Further Reading” below). Instead, it will be useful to highlight findings from a number of studies which represent a wide range of new digital media. In each case, we can ask: How do the data sources provide new insights? What are the main findings? Where can the findings be located in relation to existing research? And how can these findings be built upon, and what are their limitations? With this, we can begin with Wikipedia.
Wikipedia and Other Information Sources
Wikipedia has been widely researched apart from studies using big data approaches: a recent review counted almost 3,000 papers about Wikipedia by 2013 (Bar-Ilan & Aharony, 2014). Most big data studies focus on Wikipedia entries and on the process of collaboration. Very few studies, in contrast, have examined who reads or accesses Wikipedia, which is arguably just as important for communication research. This is so particularly since Wikipedia is the only noncommercial website that is consistently among the top ten most frequently accessed websites worldwide (Alexa). Wikipedia is used across the world and exists in many languages (List of Wikipedias), though there are competitors. The main alternative to using Wikipedia, at least in mainland China, is Baidu Baike, a Chinese-language online encyclopedia based on the Wikipedia model and developed under the auspices of the Chinese search engine company Baidu. In mainland China, Baidu Baike is more commonly used than the Chinese-language version of Wikipedia, which was banned for several years (Liao, 2009). Outside mainland China, among the large Chinese-speaking population outside the People’s Republic, however, Chinese Wikipedia is more popular than Baidu Baike. One reason to mention this is that if we have big data findings about Wikipedia, either for the English-speaking version or for the Chinese or other language versions, it is worth bearing in mind that Wikipedia is used very widely, but not universally.
Wikipedia is an important source of big data because it is openly accessible to researchers—which makes it more reliable as a data source than proprietary sources. In addition to openness, the research in this case can be built upon (replicability), which is a criterion for valid scientific knowledge (a point that is discussed further in the conclusion). In this respect, it is useful to take a quick detour to consider the criticisms of a widely referenced study that used Google Trends to attempt to predict flu outbreaks. In the study (Ginsberg et al., 2009), researchers claimed that flu could be predicted by analyzing how often and where “flu” and related search terms were being sought on the Google search engine. Lazer, Kennedy, King, and Vespignani (2014) criticized this study for its methodology, but also on the grounds that Google’s “black box” did not allow researchers to ascertain how Google Trends are arrived at.
How does this relate to Wikipedia? Another group of researchers went on to show that accessing articles about flu and other diseases on Wikipedia can predict disease outbreaks more accurately than Google Trends (Generous, Fairchild, Deshpande, Del Valle, & Priedhorsky, 2014, see also McIver & Brownstein, 2014). (Tellingly, and echoing the point made earlier about online encyclopedias in China as against the world-at-large, the one disease that could not be predicted is Ebola, which mainly affects people in West Africa, where Internet penetration is low, while Wikipedia articles about Ebola were mainly accessed in rich countries: in rich countries, Internet use is widespread, but Ebola is exceedingly rare.) What the comparison between disease prediction using Google Flu Trends and Wikipedia highlights is that knowledge derived from Wikipedia is open and can be built upon. Indeed, the Wikipedia researchers in this case have made their datasets available and have encouraged others to replicate and improve upon or criticize their results.
This brings us to another feature of big data studies, which is that it is often not known who the users are that leave digital traces. This point also applies to studies of Wikipedia. However, West, Weber, and Castillo (2012) had access to log data about the behavior of people who edited Wikipedia and who used the Yahoo! tool bar in their web browser. They studied these Wikipedia contributors who use Yahoo! to analyze the kinds of information that they seek. This allowed them to link the edits the contributors made to Wikipedia with the kind of knowledge they brought to the task, based on their browsing (in this case, Yahoo!) behavior. What they found, among other things, is that contributors to the entertainment-related part of Wikipedia, which makes up seven of the ten largest categories of article topics (West et al., 2012, section 6), look for more information on these topics than those who are not editors; put differently, editors seem to be more expert than others in the sense that they seek more information. Or again, when they break down this expertise into “science, business, and humanities” as against “entertainment-related” editors, they find that the former are more “generalist,” whereas the latter are “from editors immersed primarily in popular culture.” These findings make a start on building up a picture of what kinds of people contribute with what kind of knowledge to Wikipedia (it can be added that the authors of the study argue that Yahoo! users are unlikely to be different from people using other browsers). Perhaps we can also make inferences based on these findings about what kind of knowledge and information different types of people are interested in more generally.
Wikipedia is thus an excellent source of digital data because data from Wikipedia are available to anyone and it is possible to understand how they were generated. But Wikipedia is also a rather special digital medium, used mainly for finding information—unlike other media, such as television, which push information toward the user (recall a major point that “big data” pertains to media that are neither mass nor interpersonal). Yet the role of information seeking in everyday life is not well understood (but see Aspray & Hayes, 2011; Savolainen, 2008). In the case of Wikipedia, Waller (2011) analyzed “The search queries that took Australian Internet users to Wikipedia” (as the title of her paper has it). To do this, she had access to log data from the marketing company Hitwise Experian for Australian Internet users (again, we can note that big data studies often depend on commercial data sources). Almost all visitors to Wikipedia (93%), she found, came from Google. Interestingly, she also found that the entries that people sought were quite diverse: of the 600,000 search queries that took users to Wikipedia, at least 400,000 appear only once. And if we recall that Wikipedia is one of the world’s most popular websites, it is notable how diverse the information is that people are looking for. In terms of the content that people looked for (Waller analyzed a subset of queries and classified them), 36% pertained to popular culture such as movie or music stars (mostly American) and 2% to high culture. Or again, 7% pertained to science and 6% to history. At the same time, she found only minor, though significant, differences among the population in terms of income and other demographic characteristics of which groups sought what kind of content. And again, it is worth stressing that this kind of access to data (user demographics) is rare in academic big data studies.
To summarize, from the point of communication research, as opposed to research about the nature of Wikipedia entries or about online collaboration, two questions we might want to ask are: Who reads Wikipedia? What does this source of knowledge provide that other sources do not? What many big data studies have focused on is a topic that we have data about: Who contributes to Wikipedia, and how do they collaborate? These are important topics, but surely an equally important one is who makes use of Wikipedia, and about that there are also abundant data (there is a website, www.stats.grok, where data can be obtained about which Wikipedia entries are accessed). Yet this is a topic about which we know little, perhaps because researchers have found it more interesting to examine how knowledge is produced than how it is consumed. In any event, big data studies using Wikipedia provide findings about one of the most popular sources of online information, though these are hard to fit into theories of “mass” or interpersonal media (and in this case, information found on Wikipedia is mainly sought via Google, which again goes beyond traditional theories of communication). Another key point in this case is that big data research is often done on topics which can readily be analyzed (the nature of entries and how collaboration works), rather than topics which may be of broader interest to communications scholars (what information people look for on Wikipedia). Finally, Wikipedia is a noncommercial platform that makes its data freely available to researchers and is transparent about how the data were produced. As we shall see, this feature sets Wikipedia apart from most other sources of big data.
Facebook and Other Social Network Sites
Research about Facebook has led to two major controversies. The first concerned the possibility of de-anonymizing the data and was thus about privacy. It was triggered by an early study which, though not necessarily on the scale of “big data,” is mentioned here because it raised the issue not just of privacy but also of whether this data source could be made publicly available. The study analyzed students’ online ties (their Facebook “friends”) and their offline ties in relation to their cultural “tastes” (Lewis, Kaufman, Gonzalez, Wimmer, & Christakis, 2008). The research caused a stir when it was discovered that the university where the study was done, and perhaps the students, could be identified (Zimmer, 2010), which led the researchers to not make the data available to others.
The second controversy also involved questions of privacy, but in addition raised an issue distinctly related to big data. In this case, researchers experimented by dividing Facebook users, overall almost 700,000 of them, into two groups (Kramer, Guillory, & Hancock, 2014): one group of users had more positive words introduced into their newsfeeds, the other group was exposed to more negative words. The researchers then measured whether these users subsequently, in the light of the two “treatments,” themselves posted more positive or negative words. They found that indeed they did so, confirming “social contagion.” The issue raised in this case is whether Facebook, and academic researchers in particular who took part in the study, should carry out experiments that manipulate Facebook users (Schroeder, 2014b). What we see here is that, regardless of the findings, the insights gained could be used to influence Facebook users’ communication patterns.
It is rare in communications research, especially at the scale of hundreds of thousands of people as opposed to small groups in a laboratory, to be able to influence people’s behaviors in this way. Another study that illustrates how social network sites influence behavior was done by Bond et al. (2012), who tested whether different types of messages from Facebook users in the United States urging their friends to vote could lead more of them to vote. Indeed, it was found (among other things) that a message from close friends (friends of friends, or two degrees of separation) had more powerful effect in increasing voter turnout than messages from more distant friends. The authors of the study argue that this kind of influence or mobilization of potential voters via social network sites could play an important role in close elections.
Yet another line of inquiry about Facebook has been about whether “friends” who share content also share political views or political ideologies. Bakshy, Messing, and Adamic (2015) investigated this question for more than ten million American Facebook users, and found that Facebook friends are ideologically quite diverse, which is partly because their ties reflect offline networks such as family, school, and work—in contrast with Twitter users, who share common interests or topics but not necessarily offline ties, and who are therefore more ideologically polarized (Conover et al., 2011).
A more recent study (Settle et al., 2016) examined the content of Facebook messages, and in particular “status updates,” which most (73%, according to Hampton, Goulet, Rainie, & Purcell, 2011) Facebook users make at least once a week. It can be added that 60% of Americans use Facebook and 66% use it for civic or political activity (Rainie, Smith, Schlozman, Brady, & Verba, 2012). Using machine learning techniques, the researchers separated out content that was political in nature in relation to the US presidential election in 2008 and the Health Care reform debate in 2009. They were able to show that political messages closely track major events (in the case of the election, these included the party conventions, the election itself, and the inauguration; and in the case of the health care debate, the shift in the discussion from using the term “health care reform” to “Obamacare”). They could also see peaks and troughs in the use of emotional language. Again, given the large number of Americans and others who exchange political messages on Facebook, these are important findings.
At the same time, Facebook is only one of several social network sites, even if it is the dominant one in the United States and across most of the world. Yet there are many who do not use Facebook, which may impact the validity of these studies. Further, again, there are parts of the world where Facebook is almost nonexistent, such as in China (because the government has banned it), but also in Russia, where a rival social network site service (VKontakte) has become dominant. This again raises the issue for studies of Facebook users (as we also saw in the case of Chinese Wikipedia) concerning what part of the population these studies represent. And as mentioned, the question of the extent to which Facebook users accept being analyzed, even if their privacy and anonymity are safeguarded, and whether the data can be made available to other researchers will continue to be challenges. In the meantime, findings about how people behave on this popular social media platform, and how they influence each other in particular, are bound to remain growing areas of research.
The World Wide Web as a Source of Online Information
The Web is less associated with communication research using big data than the other digital media discussed here. This is partly because there are surprisingly few studies of what the Web can tell us about offline social relations generally (but see Bruegger & Schroeder, 2017). Studies of information seeking via Wikipedia (see “Wikipedia and Other Information Sources”) or search engines (see “Search Engines”) are of course aspects of studies of the Web, and in this sense yield insights into society. Yet the Web can also be seen as an entity in itself, which tells about what kind of information can be found online, how this information is organized or interlinked, and how it reflects (or reflects in a distorted way) wider changes in society.
One big data approach to the shape of the Web has been to test whether what is accessed online reflects offline political or cultural or linguistic borders. This is an interesting question because it has often been claimed that the Web is a unique medium insofar as it can be accessed from anywhere—unlike traditional media that are confined, for example, by national broadcasting regulations or by the reach of transmitters and the like. In other cases, most notably in China, it has been argued conversely that the government and its censorship regime ring-fence the Web, making it into a cultural resource whose reach is circumscribed by the state. Both ideas are misleading, as Taneja and Wu (2014) have shown: first, the Web in China is no less densely bounded off from the Web than other non-English-speaking large clusters on the Web. The way that Taneja and Wu arrive at this finding is by examining traffic to the top 1,000 websites (which together receive more than a 99% share of attention globally), and then grouping these into sites that receive shared attention. Shared attention is defined as when, if someone clicks on one site, they also visit another (after controlling for the statistical chance of covisiting). In the case of China, apart from language, it may be that the state’s active information and communication technology policy has promoted a Chinese-centric web, as in other cases like Korea. But the Chinese Web is not as straightforwardly or uniquely circumscribed by a wall of censorship as is commonly thought. Instead, it may simply be that Chinese citizens, like those of other nations, are primarily interested in content produced in China.
Wu and Taneja (2015) have extended this analysis to argue that the “thickening” of the Web has changed over time, such that whereas in 2009 a global/US cluster was most central on the Web and at the same time the largest, in 2011 it was overtaken by a Chinese cluster, and there was no longer a global/US cluster, but rather in second place was a US/English cluster followed by a global cluster. The same two clusters occupied the top two spots by size in 2013, but the global cluster (of websites that are not language specific such as Mozilla and Facebook) had slipped to eighth place (India was ninth and Germany tenth), followed by a number of other clusters including sites in Japan and Russia, but also Spanish-language sites and those in Brazil and France. What we see here is the evolution of the Web becoming more oriented toward the global South (Spanish-language sites and sites in Brazil and also India). We also see, with time, that websites of “global” status have become fewer in number among the world’s top 1,000 sites, and we see language playing an increasing role. State policies promoting information and communication technologies are one factor here, and shared language another. Whatever the most important factors may turn out to be, the Web is not becoming a single whole, but rather a series of clusters: linguistic, and those that develop due to the policies of states and sites promoting shared interests such as commerce or personal relations. (It can be added that Taneja and Wu used data from ComScore, a company that analyzes web traffic, to arrive at their findings.)
Again, against the background of this kind of large-scale analysis of the shape of the Web, as with Wikipedia and other sources of online information, we would need to know how people use the Web in everyday life. But such research on how people search for information is still thin on the ground (Aspray & Hayes, 2011; Savolainen, 2008), particularly in relation to how people find information on the Web (Rieh, 2004; Schroeder, 2014a). A major issue that has not yet been resolved in communication studies is where to “put” information seeking in general. A simple way to grasp this point is to ask: where did people seek information before the advent of the Web, say, in the mid-1990s? (The same point could be raised, of course, in relation to Wikipedia, and search engine behavior.) They might have consulted a book version encyclopedia instead of Wikipedia, a travel agent instead of a travel website, a pamphlet instead of a blog, and so on. Yet these “media” were also not much studied. What makes the Web different is that it contains all of this information, but also that none of these uses of the Web fit easily into categories in the study of offline behavior or that of other digital media—or indeed into the categories of mass and interpersonal communication. What these uses do fit is the subject studied by the discipline of information science, but that is a discipline with which communication scholars barely overlap (and information science rarely examines ordinary everyday behavior). In any event, the Web, in view of the fact that it is a large and accessible source of data and increasingly important in peoples’ lives, is bound to grow as a basis for big data research.
Twitter and Microblogging
Twitter has been among the most commonly used sources of academic research using big data techniques. One criticism that is often made of studies of Twitter (as of other social media, as we have seen) is that it is not known who the users are, and hence what part of the population—perhaps a very skewed one—they represent? Barberá and Rivero (2014) addressed the problem of representativeness for Twitter users by analyzing all tweets in relation to the two candidates in the Spanish legislative elections of 2011 and the American presidential elections of 2012, in each case for 70 days before the election. They found that Twitter users are disproportionately male, somewhat biased toward urban areas, and “highly polarized: Users with clear ideological leaning are much more active and generate a majority of the content” (2014, p. 3). Since they were also able to identify the follower networks of these users, they could show to what extent it mattered that a small number of users generated the majority of the content—in terms of the “reach” of their tweets. What we can see here is that one way to overcome the problem of the representativeness of Twitter users is to focus on Twitter uses for specific purposes. What we also see, however, is that this way of overcoming an unrepresentative population takes considerable effort, including finding the network of users. Further, even when this effort has been made, it will still be necessary to think about the larger context of how Twitter differs from other media, in this case during elections, especially since compared with, say, television, the role of Twitter is more protean.
Another study that gets close to who the users are, though with a different approach, was done by Bastos and Mercea (2015). They analyzed 20 million tweets in 2009–2013 for 193 political hashtags (such as those related to the Occupy movement or the protests in Iran). Then they focused on those Twitter users who frequently tweeted across several of these hashtags. These they labeled “serial activists” and interviewed 21 of them, as well as reconstructing their networks and those of their followers. What they found was that these serial activists are highly interconnected with each other, even across language barriers (which they overcome partly through peer support, and partly by means of Google Translate). But they were also able to identify who these activists were: urban adults who were much older (average age 45) than average Twitter users, typically on a low income, with a high proportion of IT professionals, and who often lived in cities that had longer periods of Occupy movement camps. The serial activists were found to be motivated by idealism (“expressive” rather than “instrumental” motives) and most of them had also participated in offline protests. Further, they displayed long-term commitment rather than short bursts of activity, refuting the idea that Twitter represents a shallow or “clicktivist” political commitment. What is interesting here is that “big” data can be combined with qualitative “small” data to show not only how political activists are located in larger social networks, but also their characteristics and motivations. This approach is useful in view of the fact that large-scale analyses of political and other forms of communication often show only highly abstract patterns of activity or patterns in networks that may be difficult to relate to what people actually do (in this case, how they are involved in political protest) and how they actually work together and interact.
A different approach of how to put Twitter in a larger context can be illustrated by reference to the study by Neuman, Guggenheim, Mo Jang, and Bae (2014). They examined agenda setting in traditional news media compared with online news media by reference to data from newspapers and television over several decades and more recently Twitter, blogs, and discussion forums. They asked: Do social media (in this case) change agenda setting compared with traditional media? Among their findings: “Social media are more responsive to public order and social issues and less responsive to the abstractions of economics and foreign affairs” (2014, p. 7). This finding has interesting implications since it suggests that what journalists are interested in differs from what people are interested in when they generate content themselves (see also Boczkowski & Mitchelstein, 2013). Here we therefore have a unique study which both compares new and traditional media and specifically tests an existing theory in media and communications—agenda setting—with the interesting result that old and new media are different, even if, as the authors point out, it is increasingly difficult to tell them apart.
Like other digital media, Twitter is global, and yet it is variably used across the globe and competes with other microblogging services. In China, it is officially banned, even if there are many mainland Chinese who still use it (Sullivan, 2012). It is also an evolving medium, used by protest movements and journalists, for example, but also by ordinary people as a means of sending short messages in the manner that text messages are sent via phones. Again, although some studies exist (Duggan, 2015), relatively little is known about who the users are. And as with the other media and data sources discussed here, Twitter illustrates the issue of publicly versus commercially available data (Puschmann & Burgess 2013): Twitter makes a certain nonrepresentative sample of data available publicly (1% or so, but Twitter’s policies have changed over time and also vary by topic, and sometimes up to 10% can be obtained) via an application programming interface (API), and many academic studies have used this “sample” as opposed to the whole dataset (the “firehose,” only available for purchase). But there has been much discussion about the extent to which 1% is a biased sample, and González-Bailón, Wang, Rivero, Borge-Holthoefer, and Moreno (2014) have shown, by means of an examination of Twitter during a political protest, that a 1% sample (which they compared with the complete dataset) is highly problematic in terms of drawing valid conclusions about protest messages. Nevertheless, Twitter has been widely used to study political communication; hashtags related to protests or mentions of parties to predict elections have been analyzed, among others (Jungherr, 2015). It is likely Twitter will continue to be a popular source for research, if only because it is relatively easily accessible.
As we have seen, what information people search for has been a subject of great interest, and big data are abundant in this case, even if, as we saw with the Google flu trends study, this research is often difficult to replicate due to the proprietary and hence “blackboxed” nature of the data. Anderegg and Goldsmith (2014) examined a different topic using Google Trends: attitudes to climate change. They did this by focusing on two events which have been labeled “climategate” scandals: one concerned emails from climate researchers that were leaked and allegedly showed that they had “covered up” results that downplayed the threat of climate change. The other was a similar alleged misrepresentation of results about melting glaciers arising from a report of the Intergovernmental Panel on Climate Change (IPCC). These two events in late 2009 and early 2010 were widely covered in the media and could have been expected to lead to a shift toward greater skepticism about climate change. Anderegg and Goldsmith examined trends in searches using Google Trends for keywords related to climate change for these events. They found that, although these events led to a spike in searches that indicate increased skepticism about climate change, this effect was transient, lasting only a couple of months (for example, searches for “global warming hoax” spiked during and shortly after the events, but normalized thereafter). What the authors also found, however, was “a strong decline in public attention to climate change since 2007” and up to 2013 (Anderegg & Goldsmith, 2014, p. 6). This finding should be seen in the context that the Web has come to be the single most important source of scientific information, at least in the United States (Horrigan, 2006).
Another interesting question is what people search for when they use search engines—in general. Waller’s study (2011), discussed earlier, had access to the logs of Australian Internet users, among whom almost 90% use Google. She found that e-commerce and popular culture topics accounted for almost half of all queries. She also had access to the demographics of the users (as mentioned, from the company Hitwise Experian), and found that queries did not differ across different groups. Further, most queries (48%) were not really queries in the sense of looking for information at all; they were “navigational” searches where users had a specific website in mind (such as Facebook or the BBC) and merely used a search engine to get to the site.
Along similar lines, Segev and Ahituv (2010) studied the 150–200 most popular searches on Google and Yahoo! across 21 countries, finding results similar to Waller’s studies of Australians, such as the preponderance of popular culture or entertainment searches. What these studies of search-engine behavior show is that a picture can be built up of people’s information interests as indicated by how they seek information about various topics. Again, if these findings can be put into context, including how they relate to which search engines are most popular where, and how people use search engines in combination with other sources, then it will be possible to build up a rich picture of people’s social behavior—in this case, neither interpersonal or mass communication behavior, but information-seeking behavior.
There have been many studies about how often people connect with others and across what kinds of distances via telephones (Fischer, 1992), and recently on a large scale via mobile phones (Licoppe, 2004). Ling, Bjelland, Sundsøy, and Campbell(2014) showed that our regular and most frequent contact via mobile phones, both text and voice, is nevertheless with a small number of people. They analyzed mobile call records in Norway for a three-month period from the dominant mobile operator in the country and found that most connections are with a small group of people that are close by: “the mobile phone . . . is used in the maintenance of everyday routines with a relatively limited number of people in a relatively limited physical sphere of action . . . the stronger is our tie . . . the closer they are likely to be geographically” (2014, p. 288). Like Fischer for the landline telephone, they thus disconfirm the often mooted idea of “the death of distance” or of a “global village.” They could also distinguish between rural populations, where “the largest proportion of calls is to those who are less than 1 km away,” and urban ones, where “the preponderance of calls goes to people who are more than 1, but less than 24, km distant” (2014, p. 288). This is a counterintuitive finding, since it might be expected that rural people’s calls would be to more distant people and vice versa. However, if we think of the distances that urban people typically drive, also to get to work, and the age of urban and rural populations, the findings make sense (and may have implications for transport and mobile phone operator charging policies, among other things).
How mobile phones, as smartphones, are being used to access the Internet is still not well understood. Perhaps the difference between the smartphones and the Internet is becoming blurred, though Napoli and Obar (2015) argue that mobile phone users represent an “underclass” because of the much more limited functionality of mobile Internet as opposed to access via desktop or laptop computer. They do this by reviewing studies that show that desktop or laptop computers are more useful for content creation and complex tasks while smartphones are mainly used for more passive and constrained ones. Nevertheless, this argument is counterintuitive, since young people in high-income countries in particular use smartphones ever more to do a wide variety of things. Still, it is important to remember, as Donner (2015) points out, that these affluent and highly skilled smartphone users are a small minority worldwide. Moreover, affluent users almost invariably also have Internet access via laptops and other devices such as tablets, as well as having high-bandwidth connections at a (relatively) low cost, so even if their smartphone uses are constrained, they can combine them with doing more demanding tasks on other devices. Users in low-income countries in Southeast Asia and Africa, in contrast, have a “metered mindset,” with scarce bandwidth which is (relatively) very expensive and thus used frugally. These users “dip and sip,” rather than “surf and browse,” as Donner puts it, and they are also likely to have far more limited skills and uses restricted by the affordances of their smartphones. This new digital divide may close over time, but the difference between the vast majority of smartphone-only users and a minority of users with multiple devices is also likely to remain a deep fault line for many decades to come.
In any event, it can be foreseen that big data studies of mobile phones, and mobile Internet use, will become much more prominent in the future, partly because in large parts of the world, including India and China, this is the dominant way in which the Internet is accessed. Put differently, the vast majority of Internet users, globally, will for the most part access the Internet via a mobile device, and they may never have access to a laptop or desktop computer. This will be the population that will be of greatest interest to social scientists of all types, and hence smartphones as a source of big data, including the location of phone users, will become ever more important. At the same time, this type of data, which includes geographical location, is obviously more sensitive than other media or communications data.
It is also true that in this case, as in others, access to commercial data is a precondition for carrying out studies like those by Ling et al. (2014) and Licoppe (2004). In these two cases, researchers had access to the phone network studied, Telenor and France Telecom, respectively. Further, as elsewhere, one issue in carrying out big data studies concerns the territory covered by the service provider—or rather, several service providers in most cases. This challenge also applies when mobile phones are studied for the purpose of crisis communication and disease control (see Bengtsson, Lu, Thorson, Garfield, & Von Schreeb, 2011, for a study of the movement of people during the Haiti earthquake). Nevertheless, an advantage of this type of data—when it is available—is that it provides a particularly rich source. Boase and Ling (2013) showed that log data about mobile phone use are more accurate than self-report (though again, log data may be difficult to obtain since they are owned by mobile phone operators, unless researchers collect them from users themselves).
The contrast between log data and surveying people shows the advantages of obtaining “digital footprints” as against asking people about their uses of digital media: obtaining data directly cuts out the potential biases of user self-report. On the other hand, a mobile phone, like any digital device, could be used by more than one person, or one person could use several devices, and these are just some of the errors that could be avoided by asking people directly. Multimethod studies that combine log data and self-reports or interviews may be the way forward here, though they are obviously more resource intensive. Still, they can overcome the problem that digital data are more revealing about user behavior in some senses and less so in others.
As we have seen, there are many areas in which big data are being used in communication research, and the review here has been able to give only a flavor of this rapidly expanding area. At this point it will be useful to revisit the question of why this area has garnered so much attention, and also ask: What are its prospects? Big data approaches in communication research (and in other areas of knowledge) take social science in the direction of being more quantitative and statistical, and thus more scientific and more powerful, and it is important to spell out why. Quantitative social science is of course nothing new (Porter, 2008). Nor are efforts to introduce digital tools and data into research (Meyer & Schroeder, 2015). What is new in big data research are the data sources, which provide access to readily manipulable (computable) data. Social science data in the past have been hard to come by, mainly requiring face-to-face interviews or telephone surveys, and digital data are often fraught with difficulties in the case of proprietary and/or sensitive data. Still, an important point here is that the availability of data is a precondition for the growth of social scientific knowledge: data provide an independent means to check or verify (or falsify) results; they are the raw material that allows researchers to build on each other’s work. Having more of these materials, about an aspect of our social lives that is itself rapidly growing, means that this area of research is bound to continue to thrive.
This point can be made differently, by defining data—at least inasmuch as data are part of scientific knowledge. Data belong to (in the nonlegal sense of being a property of) the object under investigation; taking data comes before interpreting them; and data are the most atomic or divisible useful units of analysis (Schroeder, 2014b). The definition of “big” data, data on a scale and with a scope that is a leap beyond what was previously available in relation to a given phenomenon, thus relates directly to the availability of this raw material, especially in a form that can be readily manipulated or computed (and all the studies using big data in this article certainly meet this definition). No wonder then that communication research, and social research, has recently seen a surge of studies in this area, especially as the software tools to handle these data have also recently proliferated (see Bright, 2017). The caveats, that the data need to be such as to meet the criteria of science, of being open to validation and replication, do not need to be stressed again as they have been amply discussed above. Yet often, these criteria are not met, so that while quantitative or scientific knowledge is rapidly advancing in one sense, it also rests on uncertain foundations in another (though again, Wikipedia is a counterexample, and there are others, such as data about the Web and its links). Further, the validity of having only samples of big data, as we saw in the case of samples of Twitter data, can be subject to rigorous investigation too.
As an aside, it can be mentioned that another feature of scientific advance that often applies to big data research is that studies build and improve on each other; that is, “high-consensus rapid-discovery” science (Collins, 1994; Schroeder, 2007): examples include how Wikipedia disease prediction has outperformed Google Flu Trends research (as we have seen) or how Wikipedia results (Mestyán, Yasseri, & Kertész, 2013) have bested Twitter research (Asur & Huberman, 2010) in predicting movie box office success. It can also be noted, however, that big data in communication research are still largely in a phase of high task uncertainty and low mutual dependence (Whitley, 2000): that is, researchers are exploring many new domains, often without a sense of how this research may contribute to cumulation (Rule, 1997). Hence there is a need, in communication studies and in other areas of social science, for dialogue across the various disciplines that are pursuing research based on new sources of digital data. In this respect it may have been noticed that among the studies that have been discussed, many have not been done by communication researchers or social scientists but rather by, for example, computer scientists and researchers in the commercial sector.
Whether the social sciences should be more scientific has of course been a matter of contention. Suffice it to say here that big data approaches can be combined with other, qualitative or mixed methods approaches, as we have seen. Yet the more powerful insights based on these data sources have also been limited, again, for other reasons. First, many studies are not generalizable (or they cannot be built upon) because the data come from proprietary social media or from mobile phones. This means that the findings cannot be replicated, since the data are not accessible to other researchers—or it is not known how the data were generated in the first place. Second, big data findings are often of limited significance because they are aimed at short-term practical goals, such as when big data are analyzed for marketing purposes so that findings may not apply beyond a particular marketing campaign and the specific population being targeted or the products being sold (and in this case, the question of whether the findings are scientific may be moot). Third, studies may be limited because the source of digital data covers only a part of the world’s population, even if it is large one, for reasons of language or censorship or because a particular platform is only one of several popular ones. Fourth, and this is the reason that has been emphasized here, many digital data sources are being investigated in different directions without a sense of how findings fit into the larger picture of communication research. The first two are practical problems, and the third pertains to the scope of the study. Yet the fourth, which is a question of making an effort in the direction of theorizing, synthesizing, and integrating findings, can be overcome (indeed, the foregoing has pointed to ways of doing so).
Important questions for the future of this area thus include how to compare traditional and new media (the Neuman et al., 2014, study was highlighted as a particularly good example). And in view of the fact that there is such a proliferation of new media, it must be established how these fit into people’s overall media “diets,” and with what effects. Further, it would be useful know about the demographic and other characteristics of the users of the new digital media (the objects of study to which the data “belong”), and how data shed light not just on the specific social or media behaviors related to the devices they use but also on the larger dynamics of the role of these media in society. Again, much has been said earlier about the fact that these new media have different user populations, partly limited by language, geography, uneven access to the Internet, and other factors. How can these be compared with traditional media, which are often studied at the national or regional levels, and again, how can the sum total of traditional and new media be aggregated into an overall understanding of the role of media within and across societies? These are ambitious questions, but ultimately big data research will no longer be a specialized subfield, but will become part of the larger advance of social scientific knowledge, albeit an ever-growing part because of the increasing amounts of digital data available about social interaction.
Big data studies will thus also require new theories since digital media uses are changing rapidly. Traditional analog media are steadily declining and being displaced by digital media. One consequence (mentioned briefly at the outset) is that digital media constitute a shift away from interpersonal communication (one to one) and mass communication (one to many) toward interaction at levels between the two, as when content is shared on Twitter and Facebook or search engine results are tailored to a particular group. New media can be targeted to audiences or content shared by users such that they are not aimed at mass audiences nor limited to interactions between individuals (again, Twitter and Facebook, but also Google search results, are examples). This does not mean that traditional media are dying out; rather, they are slowly fading as people use digital media more. These new digital media add another layer to how the world is becoming mediated and take another step in the ongoing process whereby technologies tether us more closely to information and to each other (Schroeder, 2010).
If it is possible to make progress in the challenge of locating big data research findings and integrating them within what we know about the role of media (Neuman, 2016), it should not be forgotten that big data research on digital media—for the reasons mentioned (data is often proprietary), but also because of the usefulness of this knowledge—is mainly carried out outside academia (Savage & Burrows, 2007, 2009), primarily for marketing and advertising, as well as for policy and government purposes. Audience “engagement,” market shares, health and transportation, public opinion—these and other aspects of life are increasingly measured in a quantitative way and treated as commercial assets or means of governing. Big data here are part of a broader move toward more scientific, quantitative, and data-driven approaches, not just in communication research but also in the study of politics, policy, and economics. Yet these studies have little value to social scientific knowledge about media unless they can be validated and built upon. Much therefore depends on the data sources. Parks has therefore compared the current situation of communication researchers with what has happened in the past with biomedical researchers who use commercial data: “Communication researchers may have to contend with the fact that companies will grant access only to data that they believe will reflect positively on upon their commercial interests. They will discover, as biomedical researchers have, that sponsorship and assistance often comes with strings” (Parks, 2014, p. 360). Perhaps; though it is worth pointing out that unlike data in biomedical research, data from social media and mobile phones soon lose their value, so that there may reasons for commercial and other actors to share at least older data.
There is thus a broader societal context to big data research, which is that communications and social science research is only a small part of the research effort. Big data research is much more widespread in the commercial sector and in government and other organizations, where it is used for practical purposes—social engineering, if you like. The main effect, in the United States, Europe, and elsewhere, is that consumer marketing becomes more effective. Another main area of application is public opinion measurement. However, here the context becomes important: in China, for example, this type of research can be used not just to get feedback from the population but also for systematic surveillance purposes (of course, China is not alone in this, but the preconditions for more powerful such uses are perhaps unique to China, at least on a large scale; see Stockmann, 2013).
These kinds of nonacademic—social engineering—uses of big data research will expand and continue to bring benefits (marketing, governance) and dangers (surveillance, manipulation). In the meantime, it is worth remembering that even if the benefits of new digital data sources continue to grow and proliferate for academic or scientific knowledge, and even with the growing role of social media and other digital media, the findings will be limited by the extent to which digital data shed light on user behavior. And while these findings will grow, again, they should fit into broader knowledge about people’s media uses and patterns of social interaction. In this respect, the problem that new digital media often do not fit the established paradigms of mass versus interpersonal communication can be seen as a useful opportunity to develop new theories of communication rather than as a limitation.
Discussion of the Literature
Big data research is still too new to have established a body of literature. A number of reviews of this research have been produced, such as Ekbia et al. (2015) and Golder and Macy (2014). In relation to communication research in particular, the book by Neuman (2016) is about theories of the Internet generally, but discusses big data on a number of occasions in relation to existing communication theories. Jungherr (2015) provides an overview of Twitter research in politics. Schroeder and Taylor (2015) give an overview of Wikipedia research. Evans and Aceves (2016) review the various automated techniques for analyzing texts. There is also an extensive literature about the ethical, legal, and social implications of big data, as opposed to big data in academic research (though the two sometimes intersect); an overview of these can be found in Pasquale (2015).
Thank you for the helpful comments from Cornelius Puschmann.
Those interested in big data in communication research may wish to obtain overviews, also in relation to particular data sources (see “Discussion of the Literature”). Apart from this, publications about big data in communication research are quite disparate, and interested readers may want to pursue works related to particular types of social media (for example, microblogs such as Twitter or social network sites like Facebook), or they may want to explore particular areas of communication (political communication, marketing, and crisis communication are examples) or particular methods (social network analysis, sentiment analysis, or experiments). An interesting analysis of how big data techniques can be applied at different scales—from individuals, to larger units such as cities and nation-states, all the way to the global level, can be found in Eagle and Greene (2014), though this book does not focus specifically on communication research. The journal Big Data and Society is devoted to social science research in this area, and there is a special issue of the Journal of Communication (April 2014) specifically devoted to big data and communication research.
Anderegg, W. R., & Goldsmith, G. R. (2014). Public interest in climate change over the past decade and the effects of the “climategate” media event. Environmental Research Letters, 9(5), 054005.Find this resource:
Aspray, W., & Hayes, B. (Eds.). (2011). Everyday information: The evolution of information seeking in America. Cambridge, MA: MIT Press.Find this resource:
Asur, S., & Huberman, B. A. (2010). Predicting the future with social media. Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web intelligence and intelligent agent technology. Vol. 1, pp. 492–499. Washington, DC: IEEE Computer Society.Find this resource:
Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132.Find this resource:
Barberá, P., & Rivero, G. (2014). Understanding the political representativeness of Twitter users. Social Science Computer Review, 33(6), 712–729.Find this resource:
Bar-Ilan, J., & Aharony, N. (2014). Twelve years of Wikipedia research. WebSci ’14: Proceedings of the 2014 Conference on Web Science (pp. 243–244). New York, NY: ACM. Find this resource:
Bastos, M. T., & Mercea, D. (2015). Serial activists: Political Twitter beyond influentials and the twittertariat. New Media and Society, 18(10), 2359-2378.Find this resource:
Bengtsson, L., Lu, X., Thorson, A., Garfield, R., & Von Schreeb, J. (2011). Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A post-earthquake geospatial study in Haiti. PLoS Med, 8(8), e1001083.Find this resource:
Boase, J., & Ling, R. (2013). Measuring mobile phone use: Self‐report versus log data. Journal of Computer‐Mediated Communication, 18(4), 508–519.Find this resource:
Boczkowski, P., & Mitchelstein, E. (2013). The news gap: When the information preferences of the media and the public diverge. Cambridge, MA: MIT Press.Find this resource:
Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D. I., Marlow, C., Settle, J. E., & Fowler, J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489, 295–298.Find this resource:
Bright, J. (2017). “Big social science”: Doing big data in the social sciences. In N. Fielding, R. M. Lee, & G. Blank (Eds.), Handbook of Online Research Methods (chapter 12). London, UK: SAGE.Find this resource:
Bruegger, N., & Schroeder, R. (Eds.). (2017). The Web as history. London, UK: UCL Press.Find this resource:
Collins, R. (1994). Why the social sciences won’t become high-consensus, rapid-discovery science. Sociological Forum, 9(2), 155–177.Find this resource:
Conover, M., Ratkiewicz, J., Francisco, M. R., Gonçalves, B., Flammini, A., & Menczer, F. (2011). Political polarization on Twitter. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 133, 89–96. Palo Alto, CA: AAAI.Find this resource:
Donner, J. (2015). After access: Inclusion, development, and a more mobile Internet. Cambridge, MA: MIT Press.Find this resource:
Duggan, M. (2015). Mobile messaging and social media 2015. Pew Research Center, Internet, Science, and Tech, March 17–April 12.
Eagle, N., & Greene, K. (2014). Reality mining: Using big data to engineer a better world. Cambridge, MA: MIT Press.Find this resource:
Ekbia, H., Mattioli, M., Kouper, I., Arave, G., Ghazinejad, A., Bowman, T., & Sugimoto, C. R. (2015). Big data, bigger dilemmas: A critical review. Journal of the Association for Information Science and Technology, 66(8), 1523–1545.Find this resource:
Evans, J. & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual Review of Sociology, 42: 18.1–18.30.Find this resource:
Fischer, C. (1992). America calling: A social history of the telephone to 1940. Berkeley: University of California Press.Find this resource:
Generous, N., Fairchild, G., Deshpande, A., Del Valle, S. Y., & Priedhorsky, R. (2014). Global disease monitoring and forecasting with Wikipedia. PLoS Computational Biology, 10(11), e1003892.Find this resource:
Ginsberg, J., Mohebbi, M., Patel, R. S., Brammer, L., Smolinski, M., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012–1014.Find this resource:
Golder, S., & Macy, M. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 6.1–6.24.Find this resource:
González-Bailón S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno Y. (2014) Assessing the bias in samples of large online networks. Social Networks, 38, 16–27.Find this resource:
Hampton, K., Goulet, L. S., Rainie, L., & Purcell, K. (2011). Social networking sites and our lives. Pew Research Center, Internet, Science, and Tech, June 16.
Horrigan, J. B. (2006). The Internet as a resource for news and information science. Pew Pew Research Center, Internet, Science, and Tech, November 20. http://www.pewinternet.org/2006/11/20/the-internet-as-a-resource-for-news-and-information-about-science/.
Jungherr, A. (2015). Analyzing political communication with digital trace data: The role of twitter messages in social science research. London, UK: Springer.Find this resource:
Kramer, A., Guillory, J., & Hancock, J. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790.Find this resource:
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google flu: Traps in big data analysis. Science, 343(6176), 1203–1205.Find this resource:
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Facebook.com. Social Networks, 30(4), 330–342.Find this resource:
Liao, H.-T. (2009). Conflict and consensus in the Chinese version of Wikipedia. IEEE Technology and Society Magazine, 28(2), 49–56.Find this resource:
Licoppe, C. (2004). “Connected” presence: The emergence of a new repertoire for managing social relationships in a changing communication technoscape. Environment and Planning D: Society and Space, 22(1), 135–156.Find this resource:
Ling, R., Bjelland, J., Sundsøy, P. R., & Campbell, S. W. (2014). Small circles: Mobile telephony and the cultivation of the private sphere. Information Society, 30(4), 282–291.Find this resource:
McIver, D., & Brownstein, J. (2014). Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Computational Biology, 10(4), e1003581.Find this resource:
Mestyán, M., Yasseri, T., & Kertész, J. (2013). Early prediction of movie box office success based on Wikipedia activity big data. PLoS ONE, 8(8), e71226.Find this resource:
Meyer, E. T., & Schroeder, R. (2015). Knowledge machines: Digital transformations of the sciences and humanities. Cambridge, MA: MIT Press.Find this resource:
Napoli, P., & Obar, J. (2015). The emerging mobile Internet underclass: A critique of mobile Internet access. Information Society, 30(5), 323–334.Find this resource:
Neuman, W. R. (2016). The digital difference: Media technology and the theory of communication effects. Cambridge, MA: Harvard University Press.Find this resource:
Neuman, W. R., Guggenheim, L., Mo Jang, S., & Bae, S. Y. (2014). The dynamics of public attention: Agenda-setting theory meets big data. Journal of Communication, 64, 193–214.Find this resource:
Pariser, E. (2011). The filter bubble: What the Internet is hiding from you. London, UK: Penguin.Find this resource:
Parks, M. (2014). Big data in communication research: Its contents and discontents. Journal of Communication, 64, 355–360.Find this resource:
Pasquale, F. (2015). The black box society: The secret algorithms that control money and information. Cambridge, MA: Harvard University Press.Find this resource:
Porter, T. (2008). Statistics and statistical methods. In T. Porter & D. Ross (Eds.), The modern social sciences (pp. 238–250). Cambridge, UK: Cambridge University Press.Find this resource:
Puschmann, C., & Burgess, J. (2013). The politics of Twitter data. In K. Weller, A. Bruns, J. Burgess, M., Mahrt, & C. Puschmann (Eds.), Twitter and society (pp. 43–54). Oxford, UK: Peter Lang.Find this resource:
Rainie, L., Smith, A., Schlozman, K. L., Brady, H., & Verba, S. (2012). Social media and political engagement. Pew Research Center, Internet, Science, and Tech, October 19.
Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home environment. Journal of the American Society for Information Science and Technology, 55(8), 743–753.Find this resource:
Rule, J. (1997). Theory and progress in social science. Cambridge, UK: Cambridge University Press.Find this resource:
Savage, M., & Burrows, R. (2007). The coming crisis of empirical sociology. Sociology, 41(5), 885–899.Find this resource:
Savage, M., & Burrows, R. (2009). Some further reflections on the coming crisis of empirical sociology. Sociology, 43(4), 762–772.Find this resource:
Savolainen, R. (2008). Everyday information practices: A social phenomenological perspective. Lanham, MD: Scarecrow.Find this resource:
Schroeder, R. (2007). Rethinking science, technology and social change. Stanford, CA: Stanford University Press.Find this resource:
Schroeder, R. (2010). Mobile phones and the inexorable advance of multimodal connectedness. New Media and Society, 12(1), 75–90.Find this resource:
Schroeder, R. (2014a). Does Google shape what we know? Prometheus: Critical Studies in Innovation, 32(2), 145–160.Find this resource:
Schroeder, R. (2014b). Big data and the brave new world of social media research. Big Data and Society, July–December, 1–11.Find this resource:
Schroeder, R., & Taylor, L. (2015). Big data and Wikipedia research: Social science knowledge across disciplinary divides. Information, Communication, and Society, 18(9), 1039–1056.Find this resource:
Segev, E., & Ahituv, N. (2010). Popular searches in Google and Yahoo!: A “digital divide” in information uses? The Information Society, 26(1), 17–37.Find this resource:
Settle, J. E., Fariss, C. J., Bond, R. M., Jones, J. J., Fowler, J. H., Coviello, L., . . . Marlow, C. (2016). Quantifying political discussion from the universe of Facebook status updates. Social Science Research Network. Find this resource:
Stockmann, D. (2013). Media commercialization and authoritarian rule in China. Cambridge, UK: Cambridge University Press.Find this resource:
Sullivan, J. (2012). A tale of two microblogs in China. Media, Culture, and Society, 34(6), 773–783.Find this resource:
Taneja, H., & Wu, A. X. (2014). Does the Great Firewall really isolate the Chinese? Integrating access blockage with cultural factors to explain Web user behavior. Information Society, 30(5), 297–309.Find this resource:
Waller, V. (2011). The search queries that took Australian Internet users to Wikipedia. Information Research, 16(2). Find this resource:
West, R., Weber, I., & Castillo, C. (2012). Drawing a data-driven portrait of Wikipedia editors. Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration—WikiSym ’12. New York: ACM.Find this resource:
Whitley, R. (2000). The intellectual and social organization of the sciences. 2nd ed. Oxford, UK: Oxford University Press.Find this resource:
Wu, A. X., & Taneja, H. (2015). Reimagining Internet geographies: A user-centric ethnological mapping of the world wide web. Journal of Computer-Mediated Communication, 21(3), 230–246.Find this resource:
Zimmer, M. (2010). “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 313–325.Find this resource: