Methodology

Data to Information

Our project illuminates trends in global assassination attempts in 2019-2020. We examined the effect of countries’ politics, economics, and national security—among other factors—on the number of assassination attempts as well as the survival rate and prevention rate. The first data source we used was a dataset from Global Initiative’s “Global Assassination Monitor” that included an entry for every assassination attempted in 2019 and 2020. The dataset included information about each attempt as well as the country in which it occurred. Based on the country in which the assassination attempt occurred, we were able to cross reference the dataset with data on individual countries’ economies, political states, and quality of national security. From various research institutes such as the Heritage Foundation and the United Nations Development Programme, we extracted data on the individual countries. With the primary data source and our additional country-specific data sources, we created a master dataset. In this new Excel file, each row was a country and each column included information about the country as well as its number of assassinations, prevention rate, and survival rate among other statistics calculated in Excel based on the primary data source’s information. We then converted this Excel file to a CSV and imported it into Google Colab. Reading the data from the CSV file, we compiled the data into lists and then into a DataFrame. We used the DataFrame to create visualizations—ranging from dot plots and bar graphs to choropleth charts—using Plotly Express. From these visualizations, we were able to deduce insights about which macro factors had a sizable impact on the number of assassination attempts in each country. In addition, we were able to make conclusions about what factors yielded higher prevention and survival rates.

Description of Data Source

Our primary data source is a dataset from Global Initiative’s “Global Assassination Monitor” project. The project tracks global assassinations and shares geographic and thematic insights. The data source is an Excel file with cells including plain text. Each row in the Excel file represents an assassination attempt that occurred in 2019 or 2020, and each column includes information about the attempt such as the country in which it occurred, the reported perpetrator, and whether the target survived. We chose this data source because it is easily readable and comprehensive in reporting every assassination attempt and the known information about it. Because the depth of information included about each attempt, we were able to cross reference the dataset with other data sources. We found county-specific data from the Heritage Foundation, Global Organized Crime Index, Index of Economic Freedom, United Nations Development Programme, and Michigan State University Global Edge. We incorporated this additional country-specific data so that we could analyze macro factors that might have impact the assassinations. Our final Excel spreadsheet included a row for each country and a column for each piece of information about the country as well as its number of assassinations, prevention rate, and survival rate among other statistics.

Data Cleaning Process

The data in the Excel spreadsheet was relatively clean to begin with. However, we needed to extract and modify the data such that it could be used for visualizations. First, we made sure that all of the cells in the spreadsheet were formatted as plain text rather than Accounting or Currency. We converted the file to a CSV and uploaded it to Google Colab. Next, we opened and read the file. We made lists for each column or category of information in the dataset. We iterated through the dataset, adding each piece of data to the appropriate list. While appending the data to the list, we made any necessary formatting changes such as changing the item to the float data type. Next, we compiled all of the separate data lists into one large data list. With this list, we created a DataFrame to display all of the data and use in the creation of visualizations.