State of the Union: The words, the facts

What do presidents talk about when they address the nation during the State of the Union? How does that compare to trends in jobs, wages, education, national security, and healthcare?

 

Share Newsletter

Methodology

The State of the Union is a Constitutionally-required address from our president to both branches of Congress. It serves to update the country on the current conditions of our democracy. Technically, a president’s first address to Congress is not a State of the Union, but it serves the same purpose and merits inclusion.

To create the analysis, we used transcripts from the American Presidency Project and used R to compile the text and count word frequency. Common words (“and”, “the”, etc.) and words that occur frequently in the entire corpus (“states”) are largely filtered out. The 2018 word count is based upon the speech transcript distributed prior to delivery.

More information, including sources, is available by clicking on the legend, which will navigate to each individual metric page.

The following metrics have been adjusted for inflation: median annual wage, GDP per capita, private fixed investment (non-residential), individual income taxes paid, corporate taxes paid.

R code snippet

Pre-condition: SOTU addresses are separately stored in working directory (i.e. 1980.txt)

 for (year in 1980:2017) {

 corpus<-scan(paste(year,".txt",sep = ""),"character",sep="");
 corpus<-tolower(corpus);

 #Split corpus
 words<-strsplit(corpus," ");

 #Calculate word counts
 words <- unlist(words);
 words.freq<-table(words);

 result <- data.frame(names(words.freq), as.integer(words.freq));
 write.table(result, file = paste(year,".csv",sep = ""), row.names=FALSE, col.names=FALSE, sep = ",");

 rm(corpus,words,result);

 }

Post-condition: Words and counts are stored as a csv file (i.e. 1980.csv)