Since 2001, unprecedented resources have been invested in research into global terrorism, resulting in a dramatic rise in the number of academic publications on the topic. Works by scholars from predominantly quantitative disciplines predominate in this literature, and the unfolding development of data science and big data research has accentuated the trend. Many researchers in global terrorism created event databases, in which every row represents a distinct terrorist attack and every column a variable (e.g., the date and location of the attack, the number of casualties, etc.). Such event data are usually extracted from news sources and undergo a process of coding—the translation of unstructured text into numerical or categorical values. Some researchers collect and code their data manually; others use an automated script, or combine the efforts of humans and software. Other researchers who use event data do not collect and process their data at all; rather, they analyze other scholars’ databases. Academics and practitioners have relied on such databases for the cross-regional study of terrorism, analyzing their data statistically in an attempt to identify trends, build theories, predict future incidents, and formulate policies.
Unfortunately, event data on terrorism often suffer from substantial issues of accuracy and reproducibility. A comparison between the data on suicide terrorism in Israel and the occupied Palestinian territories in two of the most prominent databases in the field and an independent database of confirmed events reveals the magnitude of these problems. Among the most common pitfalls for event data are replication problems (the sources that the databases cite, if there are any at all, cannot be retrieved), selection bias (events that should have been included in the database are not in it), description bias (the details of events in the database are incorrect), and coding problems (for example, duplicate events). Some of these problems originate in the press sources that are used to create the databases, usually English-language newspaper articles, and others are attributable to deficient data-gathering and/or coding practices on the part of database creators and coders. In many cases, these researchers do not understand the local contexts, languages, histories, and cultures of the regions they study. Further, many coders are not trained in qualitative methods and are thus incapable of critically reading and accurately coding their unstructured sources. Overcoming these challenges will require a change of attitude: truly accurate and impactful cross-regional data on terrorism can only be achieved through collaboration across projects, disciplines, and fields of expertise. The creators of event databases are encouraged to adopt the high standards of transparency, replicability, data-sharing, and version control that are prevalent in the STEM sciences and among software developers. More than anything, they need to acknowledge that without good and rigorous qualitative work during the stage of data collection, there can be no good quantitative work during the stage of data analysis.