The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats

Few concepts ﬁgure more prominently in the study of international politics than threat. Yet scholars do not agree on how to identify and measure threats or systematically incorporate leaders’ perceptions of threat into their models. In this research note, we introduce a text-based strategy and method for identifying and measuring elite assessments of international threat from publicly available sources. Using semi-supervised machine learning models, we show how text sourced from newspaper articles can be parsed to discern arguments that distinguish threatening from non-threatening states, and to measure and track variation in the intensity of foreign threats over time. To demonstrate proof of concept, we use news summaries from The New York Times from 1861 to 2017 to create a geopolitical threat index (GTI) for the United States. We show that the index successfully matches periods in US history that historians identify as high and low threat and correctly identiﬁes countries that have posed a threat to US security at different points in its history. We compare and contrast GTI with traditional indicators of international threat that rely on measures of material capability and interstate behavior.

identify and measure threats or systematically incorporate leaders' perceptions of threat into their models.In this research note, we introduce a text-based strategy and method for identifying and measuring elite assessments of international threat from publicly available sources.Using semi-supervised machine learning models, we show how text sourced from newspaper articles can be parsed to discern arguments that distinguish threatening from non-threatening states, and to measure and track variation in the intensity of foreign threats over time.To demonstrate proof of concept, we use news summaries from The New York Times from 1861 to 2017 to create a geopolitical threat index (GTI) for the United States.We show that the index successfully matches periods in US history that historians identify as high and low threat and correctly identifies countries that have posed a threat to US security at different points in its history.We compare and contrast GTI with traditional indicators of international threat that rely on measures of material capability and interstate behavior.

Introduction
Few concepts figure more prominently in the study of international politics than threat.Many theories of international politics consider variation in the international threat environment facing countries to be decisive in explaining their foreign policies and behavior-that is, in explaining leaders' decisions to spend precious resources on the military, to sacrifice autonomy by allying with other countries, and so on.Yet as essential as threat is to the study of world politics, scholars do not agree on how to identify and measure threats or systematically incorporate leaders' perceptions of threat into their models (Leeds and Savun 2007; Yarhi-Milo  2013).In this research note, we introduce a text-based strategy and method for identifying and measuring elite perceptions of foreign states' capabilities and intentions from publicly available sources.Using machine learning models, we show how text sourced from newspaper articles can be efficiently parsed to discern arguments that distinguish hostile states from peaceful countries, and to measure and track variation in elite perceptions of foreign threats over time.
To demonstrate proof of concept, we use articles from The New York Times from 1861 to 2017 to create a geopolitical threat index (GTI) for the United States.In addition to spanning more than one hundred and fifty years of American history, the analysis includes over 385,000 news stories from the "newspaper of record" in the United States about foreign nations' military capabilities, military threats and use of force, and about military force postures, doctrines, and motivations.We also create separate indices for fifteen countries that US policymakers have defined as urgent or potential threats or risks to national security at various points in the nation's history.On the basis of this text data, we are able to track continuities and changes in American perceptions about the source and level of foreign threat facing the United States over time.Because of the language versatility of the machine learning models that we use here (Latent Semantic Scaling [LSS] and Newsmap), the GTI model can be adapted to create similar threat indices for non-English speaking countries with comparable publicly available data sources.
We evaluate the accuracy of our GTI in three ways.First, we consider whether variations in the index capture and match well-known historical events (e.g., the attack on Pearl Harbor, the launch of Sputnik) and periods in US history that international relations scholars, foreign policy analysts, and diplomatic historians consider "security rich" (e.g., postbellum America) or "security poor" (e.g., the Cold War).Second, we disaggregate our GTI measure by individual states to see whether it accurately "post-dicts" which great powers posed a discernible threat to US security, and when, and correctly identifies periods when fears about the strategic importance of smaller states (e.g., Vietnam and Cuba) are evident in news media accounts of the period.Finally, we compare and contrast our GTI with other quantitative indices that treat defense spending, interstate disputes, alliance portfolios, and UN voting behavior as proxies for foreign threat to show where the GTI measure and other quantitative indices converge and diverge in assessing America's international threat environment over time.
The paper is organized into three sections.In the first section, we briefly review the main conceptual approaches in international relations to identifying and measuring foreign threats.The second section describes the semi-supervised machine learning model we use to distinguish threatening from non-threatening states and to measure and track variations in the intensity of foreign threats over time.In the third section, we describe the results and assess them using the three tests or benchmarks mentioned above.We show that our model performs well in each of these tests: variations in our GTI measure correspond to well-known events and periods of relative security and insecurity in US history; the GTI index clearly distinguishes between threatening and non-threatening countries, and recognizes that over time some of America's friends have become foes, and vice versa; and the GTI index is more granular in how it assesses threats, and responsive to sudden shifts in the international environment, than latent-threat indicators.We conclude by discussing how semi-supervised machine learning models can be used to exploit the full potential of newspaper and other text-based data that international relations scholars rely on to understand political leaders' foreign policy choices.

Geopolitical Threat Index
The idea that states and leaders worry first and foremost about security is central to many theories of international politics (Trubowitz 2011).It is thus not surprising that international relations scholars have invested a great deal of time and effort in developing and testing measures of international threat.Many scholars model threats indirectly based on capability or behavioral indicators such as patterns of military spending (Nordhaus, Oneal, and Russett 2012), the propensity to engage in militarized interstate disputes (MIDs) (e.g., Bennett 1997), and states' overall foreign policy ideological orientation (Bueno de Mesquita 1981; Signorino and Ritter 1999).Other international relations scholars emphasize the importance of subjective factors, especially elite perceptions of potential adversaries' power and ambitions (e.g., Jervis 1976; Wohlforth 1993; Yarhi-Milo 2013).These scholars draw heavily on textual materials (e.g., newspaper articles, diplomatic cables, personal papers, and parliamentary proceedings) to reconstruct political elites' views about foreign states' capabilities and intentions, and to determine how widely those views are shared by intelligence analysts, elected officials, and other opinion makers (e.g., journalists and business leaders).
Each approach has strengths and weaknesses.Capability and behavioral indicators provide insight into states' latent or potential ability to launch an attack or engage in coercive diplomacy, and are generally quantifiable.They thus permit statistical, cross-national analysis, and provide a large universe of cases (observations) for testing theories about the possible effects of external threats on alliance formation, arms races, the use of force, and other international behavior.However, these indicators are less helpful for gaining insight into when and how political elites "code" these international signals as actual, manifest threats to the national interest.Diplomatic cables, intelligence reports, newspaper stories, and parliamentary proceedings, which provide a written record of political leaders', policymakers', and opinion makers' views of foreign actors' intentions, can shed considerable light on these questions.However, in the past it was not feasible to factor elite opinions about threats into large-N research in systematic and parsimonious ways. 1 The tools needed to convert such texts into usable form for statistical analysis did not exist (Benoit 2019).
Rapid advances in computational text analysis have now made it possible for international relations scholars to mine text sources for elite and mass opinion and sentiments.Indeed, today there is a growing body of empirical work that employs these methods to study a wide variety of political phenomena.Here, we show how computational text analysis can be used to model opinion makers' characterization of foreign actions and policies as friendly or hostile, cooperative or confrontational, and peaceful or belligerent.We restrict our analysis to states here and follow Wallander and  Keohane (1999) in defining threats as a positive probability that one state has the capability and intent to harm the security of another state.In the pages below, we rely on the judgments of The New York Times (NYT) editors and foreign correspondents about states' material power, international behavior, and strategic intentions.
The NYT has one of America's largest international reporting divisions, regularly reporting on other nations' international ambitions and diplomacy, as well as their military and capabilities.Boasting the highest circulation of American metropolitan newspapers, the NYT is widely considered the most authoritative source of international (and domestic) news in the United States.It commands the attention of political, government, and business leaders (Chernomas  and Hudson 2015) and enjoys similar standing internationally, where it is considered "mandatory first reading in newsrooms across the world" (Dell'Orto 2013).The "newspaper of record," the NYT is also an important intermedia agenda setter in the United States at the national, regional, and local levels.As one of the only newspapers in the world with digital archives extending back to the nineteenth century, the NYT provides a single continuous source of news reporting. 2

Computational Text Analysis
A variety of approaches and algorithms are available for large-scale computational text analysis.Broadly speaking, these approaches differ in terms of how much human supervision (i.e., manual coding) they require (Nelson 2017; Benoit 2019).At one end of the spectrum are fully supervised machine learning models, which seek to generate inferences or predictions about some universe of texts using a "top-down" approach.The user attaches ex ante labels to a sample set of training texts.The machine uses the coding of the training texts to identify matches in the larger corpus of texts that has not been manually coded.At the other end of the spectrum are unsupervised machine learning approaches.These models do not rely on user-provided labels to help the machine to learn.Instead, the machine itself exploits differences in textual features in documents to create clusters or topics that the analyst then interprets drawing on their knowledge of clusters that emerge from the "bottom-up."Topic modeling is a well-known example of the unsupervised machine learning approach to text analysis (Roberts et al. 2014). 3 Semi-supervised machine learning models lie in between these two general approaches. 4These models try to exploit the comparative advantages of both supervised and unsupervised approaches.Supervised learning models' main advantage is that they produce results that relate directly to the analyst's substantive interests.This is because these models learn based on information (labels) provided ex ante by the analyst.One major disadvantage is that they are impractical for long historical analyses due to the high cost of training a model. 5Unsupervised models suffer from the reverse problem: they are very cost-effective at classifying or scaling large 2 To our knowledge, only three US newspapers' digital archives extend back to the nineteenth century: The Chicago Tribune, The New York Times, and The Washington Post.
5 Supervised models must "see" words multiple times to "understand" (estimate the parameters for them).Because substantively important words occur very infrequently, hundreds and sometimes thousands of documents must be labeled manually to train the model.
numbers of documents, but they require the analyst to impose ex post interpretations on results and often yield topics of little theoretical interest. 6Given our interest in identifying variations in geographically specific threats spanning a long time period and hundreds of thousands of news stories, we adopted a semi-supervised approach.

Classification and Steps
The construction of our GTI involved five phases that are summarized in Figure 1: data (article) collection, text preprocessing, geographic and thematic classification, data filtering, and finally, threat (GTI) scaling.The first collection phase involved culling and condensing all NYT news stories involving US security.We used the NYT Application Program Interface (API) to identify relevant stories in the newspaper's digital archival database.Searching the API using a simple Boolean query made up of keywords, we downloaded lead sentences summarizing the news stories. 7The corpus of news summaries (N = 387,896) was then pre-processed using Quanteda, an R-package for segmenting texts into "tokens" (words) and creating a statistically usable documentfeature matrix from the frequency of tokens (Benoit et al.  2018). 8We then classified the pre-processed text data using two semi-supervised machine learning models in parallel: Newsmap and LSS.Newsmap is a geographic document classification technique designed to identify the primary geographic (country) focus of news stories (Watanabe 2018).Unlike fully supervised models, Newsmap does not require a manually coded set of training texts.It relies instead on a dictionary of geographic seed words comprised of country and city names.The model identifies associations between words and places based on co-occurrences with the seed words.9Using this historical geographical lexicon, we then trained a Newsmap model to determine the principal regional and country focus of the news summaries of NYT stories.News summaries that Newsmap classified as primarily about events and developments inside the United States were removed from the analysis.
The same corpus of pre-processed news story summaries was submitted to a semi-supervised thematic categorization model called LSS.10 LSS also relies on a small set of userprovided polarity words as "seed words," that is, words that seed words are then used to assign polarity scores to other words in the corpus of texts (news summaries).The process where cos(v x , v y ) is the cosine similarity between word vector of words x and y.Following Turney and Littman ( 2003), we set the document dimension to 300 in the SVD. in phase three (geographic and thematic classification) involves three steps.In the first step, the semantic proximity between all words in the corpus is computed using a word embedding technique to estimate cosine similarity between the words.These words are then assigned polarity scores based on their proximity to the user-defined seed words.Finally, a polarity score for each document is computed by weighting the individual words' polarity scores by how frequently they appear in the document.We take the average of the words' polarity scores for each document. 11e used LSS to generate polarity scores for each word in The New York Times corpus of news summaries using seed words related, positively and negatively, to "hostility." 12 Our positive (+1) seed words were "adversary," "enemy," "foe," and "hostile," including plural forms of each word.Our negative (−1) seed words were "aid," "ally," "friend," and "peaceful," and again, their plural forms.Figure 2 summarizes the polarity scores for each word in the corpus (horizontal axis) and their frequency (vertical axis).We have highlighted a sample of words that are either positively or negatively associated with hostility.Words close to zero on the horizontal axis are neutral, in the sense that they are at the same semantic distance from seed words on the right side and the left side of the space. 13The location of most of the words on the right or left side of the neutral point on the horizontal axis makes intuitive sense. 14n Figure 3, we plot the individual polarity scores of documents by year.The circles in the figure represent the polarity scores for 10,000 stories randomly drawn from the full sample of 387,896 stories.News summaries with a score of more than zero are stories about hostile or "antagonistic" behavior.News summaries with a score of less than zero are stories about non-hostile or "friendly" behavior.We have plotted the moving average. 15As Figure 3 indicates, the level of hostile behavior facing the United States varies considerably over time.During the so-called Awkward Years in the 1870s and 1880s, when America was on the periphery of the world economy and had few international ambitions, it faced comparatively few threats to its interests, internationally (Pletcher 1962).By contrast, NYT reporting of hostile behavior increases sharply during World War I and before and during World War II.For most of the Cold War, the polarity score in Figure 3 remains above zero.Not surprisingly, the main exception occurs during the era of détente in the 1970s, when tensions between the United States and the USSR abated and Washington began to normalize relations 11 We compute the polarity score of a summary article k, which comprises of features F , by taking the sum of the polarity scores g f weighted by frequency of words h f : where N is the total number of words in the document.
12 Initially, we tried to measure threat directly using threat-related seed words in the text.However, we found that we achieved greater accuracy by aggregating the number of stories referring to hostile state behavior by year.This may be because measuring threat directly requires more information about the context of events than the short NYT ADI news summaries contain. 13Thus, frequently used words like "war," "peace," and "military," which appear in the center of Figure 2 tend to have small polarity scores because they appear as often in news stories about allies and friends as they do in stories about adversaries and foes. 14Without access to the complete articles behind the API summaries, we cannot determine the precise accuracy of the polarity scores for words in the corpus.However, LSS's ability to classify stories about hostile and non-hostile state behavior offers some degree of confidence, as does the intuitive placement of extreme and neutral words in the two-dimensional space in Figure 1. 15 To make the results easier to interpret visually, the polarity scores are normalized by the standard deviation and centered around the global mean.
with Beijing.The polarity scores then increase in the 1980s, during the so-called Second Cold War.Since the mid-1990s, the polarity score averages less than zero, as one would expect during an era of "unipolarity" when the United States faced no peer rival.The last two phases in the process described in Figure 1 involved data filtering and constructing the GTI.The data filtering stage removes news summaries that Newsmap classifies as principally about the United States.We are only interested in news stories about foreign nations. 16We then used these news stories to compute our GTI index for all foreign nations in our sample (225 in total over the study period) and separate indices for each of the following fifteen countries: Afghanistan, Britain, Canada, China, Cuba, France, Germany, Iran, Iraq, Japan, Mexico, Russia, Spain, Syria, and Vietnam.The GTI index refers to the proportion of summaries that are about a country's hostile military actions out of the total number of news stories about that country's military behavior in the year.We assume that the annual number of news articles published by the NYT about hostile foreign policy behavior and policies of each country was proportional to the level of perceived threat it posed to the United States in that year.

Validity Checks
We also conducted two validity and sensitivity checks to see how well Newsmap and LSS coded the NYT news summaries.The first check was a manual one, involving two steps.First, we randomly selected fifty foreign news summaries per decade over the study period (850 summaries in total) and manually classified each story by the principal country the story was about. 17We then compared the machine's coding of stories against our own coding of the same stories.Newsmap correctly classified the news summaries 76 percent of the time.Next, we compared LSS's classification of the news summaries as hostile or peaceful behavior to our own assessment of the summaries' content.Following Young  and Soroka (2012), we aggregated the classification results by decade and country because our primary interest is measuring threat levels annually for countries.As Figure 4 indicates, the proportion of hostile summaries computed for each decade (r = 0.82) and country (r = 0.94) was strongly correlated. 18e also wanted to check the reliablity of the API summaries for classifying stories substantively.Although an invaluable source of information for historically oriented research, the API only provides short summaries of news articles.To determine whether the summaries contain sufficient information to correctly classify the story in terms of country focus and level of hostility, we collected a sample of full-text NYT articles (N = 3,890) from the Nexis database between 1980 and 2018, using the same Boolean 16 Domestic stories that are filtered out account for roughly 40 percent of the total number of news summaries.These stories either contain names strongly associated with the United States or entirely lack names associated with foreign countries. 17Since the corpus has many domestic news summaries, we first separated domestic and foreign news summaries using the Newsmap classifier.We confirmed that the machine correctly distinguished between foreign and domestic news summaries 77 percent of time. 18Roughly 59 percent of the individual summaries classified as stories about hostile state behavior agreed between machine and manual classification.The level of agreement between computer and manual classification on the individual summaries is typically lower because human coders classify texts based on their understanding of historical context.By contrast, machines code solely on the basis of the words that appear in the texts.query we used to collect the API news story summaries. 19sing Quanteda, we segmented the full-text articles into segments of twenty tokens each, the average length of an API summary.We then processed the tokens using the Newsmap dictionary and the LSS model descibed above.The results are reported in Figure 5.It shows that while the median length of full-text articles in the NYT is 612 tokens, the first twenty-token segment summaries contain more of the key information about threats to the United States than the remaining twenty-token segments that make up the story.
Countries that are the focus of the news stories are mentioned, on average, 1.5 times more in the first twenty-token 19 Unfortunately, the API summaries do not provide access to the original article to test for robustness.As a result, we relied on Nexis to collect the original, full-text articles.
segment.In addition, the polarity scores are higher in the first twenty-token segment than in the rest of the news story.
Despite their short length, the NYT API summaries offer valuable information on both the source and level of threat facing the United States.Finally, we ran a simple keyword frequency analysis as a sensitivity check.While we are principally interested here in the geographic location of perceived threats, we wanted to know whether the word frequency of well-known ideologies (e.g., communism, fascism) and warfare strategies (e.g., trench warfare, guerilla warfare) matched what one might expect, given the history of the United States, the rise and fall of different ideologies internationally, and changes in military technology and warfare.Figure 6 summarizes the normalized frequency of keywords for ideology and warfare by year from 1861 to 2017.The trends square with  common knowledge.We see in the top panel that "fascism" appears frequently in the NYT news stories during the 1930s and 1940s, while "communism" increases significantly in the 1940s and 1950s."Imperialism" occurs frequently before World War I and peaks during the surge of anti-colonial independence movements after World War II.The keyword "terrorism" increases in the 1970s and 1980s and especially in the 2000s, following the September 11 attacks.
The trends we see in the lower panel in Figure 6 also square with the history of warfare and military strategy.We would expect the keyword "trench" to register a high frequency during World War I and then to drop off as trench warfare became an anachronism.As Figure 6 indicates, the term "Blitzkrieg" strategy is closely associated with World War II.While guerilla warfare has a long history, in the modern era the keyword "guerilla" is closely associated with the Cold War (e.g., Vietnam, Angola, Nicaragua, and Afghanistan).The keyword "nuclear" appears after the bombings of Hiroshima and Nagasaki.Its frequency varies over time, surging during periods when fears of nuclear war were high (e.g., Cuban Missile Crisis; US-Soviet tensions in early 1980s) as well as during periods of heightened concern about nuclear proliferation (e.g., Iraq, Iran, and North Korea in early 2000s).Given the growing concern about "cyber warfare" in the United States (and elsewhere) in the past decade, it is not surprising that the term appears with great frequency in the 2010s.

America's Changing Threat Environment
In this section, we evaluate our GTI in three different ways.We begin by considering how well the overall patterns of elevated threat gleaned from NYT news summaries for the full sample of 225 foreign nations.To assess robustness, we then disaggregate GTI by a sample of fifteen countries to judge how well it captures the ebb and flow of US relations with different nations: from those that the United States has generally enjoyed friendly relations with to those that have been America's foes at one or more points in its history.These countries account for 43 percent of total foreign news coverage over the study period.Finally, we consider how well changes in elite assessments of manifest threat correlate with latent changes in the material capabilities and international behavior of the fifteen countries in our sample.and low threat corresponds closely to historical accounts of America's foreign relations (e.g., Herring 2008).During much of the nineteenth century, for example, when the United States faced few threats to its security from abroad, the threat index is relatively low.By contrast, our textgenerated threat index is generally higher in the twentieth century, especially during periods when the United States confronted rising powers: Hitler's Germany, Imperial Japan in the 1930s and 1940s, and the Soviet Union during the Cold War.During the Cold War (1950-1991), the mean GTI score is 0.50.By contrast, from 1861 to 1913, the mean GTI score is 0.45.We see too that in wartime the threat index spikes (e.g., the Spanish-American War, World Wars I and II, the Persian Gulf War).Following the cessation of hostilities, it decreases rapidly.

Aggregate Threat Patterns
GTI is also able to distinguish between gradual as well as sudden changes in the international threat environment.The mounting challenge posed by German and Japanese power during the 1930s is reflected in GTI's steady rise in Figure 7, as are shocks associated with key events such as the sinking of the Lusitania in 1915, which presaged the US declaration of war two years later, and the September 11, 2001 attacks that led to America's wars in Afghanistan and Iraq.Within-period variation is also evident in the figure .During the Cold War, for example, the index is elevated in the 1950s and 1960s, when the rivalry was especially intense.It is considerably lower in the 1970s, when tensions eased during the era of US-Soviet détente.

America's Friends and Foes
We also created separate GTI measures for each of the fifteen countries in our sample.We divided the countries into two groups: states typically classified by international relations scholars as great powers for some or all of the past 150 years (Snyder 1991; Mearsheimer 2001)  and smaller states that have been a source or location of a perceived threat to US interests.Our list of great powers includes Britain, China, France, Germany, Japan, Russia (Soviet Union), and Spain.The smaller states were subdivided into two groups: those located or having interests in the Western Hemisphere (Canada, Cuba, and Mexico) and states located in other parts of the world (Afghanistan, Iraq, Iran, Syria, and Vietnam).In the late nineteenth century, when America was still a regional power, US policymakers were particularly concerned about states in the Western Hemisphere.Since America's emergence as a global power after World War II, states located in other parts of the world have been of growing concern to US policymakers.
We begin with Germany and Japan, once former adversaries of the United States and now among America's closest allies.Figure 8 tracks changing perceptions of each country since the Civil War.There are few, if any, surprises here.In the case of Germany, the threat index rises and falls as one might expect: it spikes during World War I and again, during World War II.In other periods, the threat index is very low.This too conforms to expectations.Germany was   of little concern to the United States during the nineteenth century (Jonas 1985).During the Cold War, West Germany was closely allied to the United States.Indeed, from the mid-1960s to the current era, the GTI for Germany is close to zero.
The pattern we see in Japan's case is similar.Here too America's worries about Japanese power at different historical junctures are captured by the threat index.We see clear spikes in the GTI during the Russo-Japanese War (1904-1905) and Japan's invasion of Manchuria in 1931, both of which caused concern in the United States (Green 2017).We also see a spike with Japan's surprise attack at Pearl Harbor (1941).Meanwhile, Japan's low threat score in the nineteenth century is consistent with scholarship on USrelations; relations were generally good (Green   2017).Since signing the Treaty of Peace with Japan (1951), formally ending the war and America's occupation of Japan, and the US-Japanese Mutual Security Treaty (1951), Japan scores very low on the threat index.This too conforms to expectations.Like Germany, Japan has been considered a trusted ally and friend for decades by the vast majority of Americans.
Figure 9 focuses on China and Russia-two former adversaries of the United States who many Americans now view as competitors and potential adversaries.Of the two nations, Russia looms as the larger threat to American interests over most of the past 150 years.This is especially clear during the long Cold War, when there is sizable gap in the GTI between Russian and China.To be sure, there are periods when concerns about China eclipse worries about Russia.Figure 10 focuses on two long-time allies of the United States: Britain and France.Overall, the level of threat posed by London and Paris to American interests is comparatively low.Concerns about British and French ambitions and power do surface from time to time.In the case of Great Britain, the threat index spikes on several occasions in the late nineteenth century and at the outbreak of World War I and World War II. 21In the case of France, the index surges in 1870, when America favored Germany over France in the pears to reflect concerns about Britain's security in the face of German threats and attacks.We manually checked stories about Britain from 1939 to 1941.There is noticeable increase in stories on "The Blitz," the German bombing campaign against the UK, including what Britain's ability to defend itself means for the United States.This appears to explain the surge in Britain's GTI score in 1940.
Franco-Prussian War (Schieber 1921), again in 1923 in opposition to France's occupation of the German's Ruhr Valley, and in 1954, when American-backed French forces went down to military defeat in Indochina.
Finally, Figures 11 and 12 track the threat index for our two groups of smaller states.Figure 11 includes countries in the Western Hemisphere. Figure 12 covers countries located in other parts of the world, principally the Middle East.Our expectation is that the GTI scores for the first group will be considerably higher in the nineteenth century, when America still defined its vital interests in regional terms.Conversely, those in the second group should be of greater concern to the United States since World War II, when the United States becomes a global power with farreaching interests.This is what we see.The only exception is Cuba, which becomes a source of great concern in the United States following the Cuban Revolution and the establishment of a new government led by Fidel Castro in 1959.

Comparison of GTI and Other Threat Measures
In this section, we compare our GTI measure of elite threat perception to other measures of foreign threat.Most exisiting quantitative measures of foreign threat rely on indicators of material capability (e.g., military spending), international behavior (e.g., the propensity to threaten or use military force), or overall foreign policy orientation as proxies for foreign threat.Some well-known examples include Nordhaus, Oneal, and Russett's (2012) Liberal-Realist Model (LRM) and Leeds and Savun's (2007) measure of threat, as well as the Composite Index of National Capability (CINC) (Singer, Bremer, and Stuckey 1972) and MID (Palmer et al. 2015) measures of relative power and interstate conflict. 22 Bailey, Strezhnov, and Voeten's (2017)   22 For each of these five measures, we constructed a threat index for the United States.LRM is the predicted probability that a country will become involved in militarized disputes based on a country's relative power, distance from other countries, alliance relationships, its degree of democracy or autocracy, and the degree of integration into the international system.It covers the period from 1951 to 2001.MIDs refer to instances where the United States was involved in   measure of UN voting dissimilarity is another measure that can be used to model state's threat environment. 23We compare our GTI index to these five measures using Pearson's correlation coefficients.The results are summarized in Tables 1 and 2, by major power and small states, respectively.
a militarized dispute short of war with one of these fifteen countries.We use version 4.0.CINC is the Composite Index of National Capability of the countries and a standard measure of national power.We use version 5.0.LS refers to the CINC score for non-US allies in the ATOP database.We include only countries that have an S score with the United States that is less than the median value of 0.458.UN is the ideal point distance from the United States measured by voting patterns at the United Nations. 23We use the ideal point distance to calculate UN voting dissimilarity scores between the United States and the fifteen countries in our sample.As Table 1 indicates, the correlation between GTI and the other measures of foreign threat varies considerably. 24On average, GTI correlates most strongly with CINC (r = 0.35), LRM (r = 0.35), and LS (r = 0.30).GTI's correlation is much weaker for MIDs (r = 0.21) and UN (r = −0.07).We also see that the correlation between GTI and the other foreign threat indicators varies considerably by major power.The only exception is CINC, where we see a modest but consistent correlation with GTI between potential adversaries' material capabilities and American elite perceptions of threat. 25The more major powers invest resources in the capacity to project power (e.g., strengthening armies; expanding military-industrial capacity), the more likely American political leaders and opinion makers are to view those investments as threatening to US interests.
The results in Table 1 are suggestive.One plausible interpretation of the patterns we see is that in gauging major powers' geopolitical intentions, American political elites and opinion makers find capabilities (e.g., military spending) a more credible (costly) signal of foreign states' inten- 24 In general, the five other measures of foreign threat correlate weakly with each other.Results are available upon request.
25 Among the major powers, the only exception is China.
tions than militarized disputes, alliance commitments, or public support or disapproval.This may be true for major powers, but it does not appear to be the case for smaller states.As Table 2 indicates, GTI correlates weakly, or even negatively, with CINC.This may be because most of these smaller states have been viewed by American policymakers as proxies or surrogates for one of the major powers (e.g., Vietnam for the Soviet Union during the Cold War).The coefficients for Afghanistan, Cuba, and Vietnam are negative for at least three of the five threat indicators.US perceptions of foreign threat would appear to have more to do with their status as chess pieces in larger great power competition rather than a result of independent policy decisions or actions.Figure 13 compares the different foreign threat indicators temporally to see how well they capture well-known periods of high and low threat in American history.The upper panel in the figure runs from 1940 to 2017 because the LS, LRM, and UN indicators do not cover as much historical ground as the GTI measure, one of GTI's comparative advantages.We see in the ebb and flow of the three indicators, LRM is closest to GTI, visually confirming the correlation coefficients in Table 1.In the figure's lower panel, GTI and MIDs are the most similar visually, especially before World War II.Looking at the two panels together, it seems clear that the GTI index is capturing a dimension of threat assessment that is different from non-text-based measures of foreign threat.

Conclusion
In this research note, we have introduced a text-based strategy and method to identify and measure foreign threats.International relations scholars recognize the importance of incorporating elite perceptions of foreign threat into the analysis of international politics and foreign policy.However, before the advent of large-scale computational text analysis, it was not possible to exploit the full potential of available data sources for understanding threat perceptions.Semi-supervised machine learning models like the ones used here offer a consistent, cost-effective way to systematically analyze how political leaders and opinion makers view their country's geopolitical circumstances.Using the American case for illustrative purposes, we trained a machine algorithm to parse NYT news summaries to generate a historical timeline measuring the ebb and flow of foreign threats to American interests since the Civil War.We selected states widely considered to be friends of the United States at some points and foes in other periods.We also included small countries as well as great powers, democratic as well as authoritarian countries, and distant as well as neighboring states.
Overall, the model efficiently distinguishes friend from foe and, importantly, adapts as elite opinion about America's international threat environment changes over time.In most instances, countries international relations scholars and diplomatic historians categorize as enemies in one era or another are captured by the GTI index.Meanwhile, countries categorized as enemies in one period were appropriately recategorized as friends in another period, and vice versa.Our machine learning text analysis model also discriminates between periods of between periods of "low threat," like the post-bellum period in America in the late 1800s, when US policymakers thought national security was plentiful, and other periods in American history such as the Cold War, when US policymakers operated on the assump-tion that they had little room for error internationally.Finally, a comparison of various threat indicators suggests that a text-based approach to threat assessment captures facets missed by measures relying solely on states' capabilities (e.g., military spending), foreign policy behavior, or geographic proximity.
Our text-based approach to threat-assessment can be strengthened in several ways.As noted above, one limitation of our model is that it is based on NYT news summaries rather than full-text articles.This may help explain the anomalous cases (e.g., the relatively low threat score for China during the 1950s and 1960s).Another limitation is that our machine learning model is based on a single source of reporting.As more and more newspapers digitize their archives, it will be possible to expand the corpus of stories and control for possible variation in newspapers' international coverage as well possible regional (e.g., internal versus coastal) bias in US news reporting. 26Because Newsmap and LSS can be easily adapted to other languages, it will also be possible to extend the analysis of threat percep- 26 The model could also be expanded to include other sources of online text data such as the Foreign Relations of the United States (FRUS).This would make it possible to see how similar media and government assessments of foreign threats are and how they might diverge (e.g., which is more forward-looking or reactive).FRUS is available at the History Lab, an interdisciplinary collective.See http://history-lab.org.Downloaded from https://academic.oup.com/isq/advance-article/doi/10.1093/isq/sqab029/6278490 by guest on 13 August 2021 tion to other countries where major newspaper archives are available in digital form (e.g., Britain, China, Japan, and Israel). 27Finally, in principle these models can be applied dyadically to study strategic interaction between states.Are threat perceptions as interdependent and mutually reinforcing as theories of alliance formation, security dilemmas, and great power transition suggest?Such questions are ripe for large-scale machine-readable text analysis of the type discussed here.

Figure 1 .
Figure 1.Phases of semi-supervised classification of text.

Figure 2 .
Figure 2. Distribution of polarity scores and word frequencies.

Figure 3 .
Figure 3. Polarity scores for The New York Times news summaries, 1861-2017.

Figure 4 .
Figure 4. Level of agreement between manual and LSS classification, by decade and country.

Figure 5 .
Figure 5. Position of countries and hostile actions mentioned in full-text articles.

Figure 7 Figure 6 .
Figure7summarizes the total GTI score for the United States for all foreign nations.The overall pattern of high

Figure 7 .
Figure 7. Baseline GTI for United States, 1861-2017.(Based on full sample of 225 countries and kernel smoothed by ±1 year.)

Figure 8 .
Figure 8. Threats to US interests by country: Germany and Japan, 1861-2017.

Figure 13 .
Figure 13.GTI and other measures of foreign threat-facing United States.