# Regressions

For this correlation and regression practice, I use a county-level voter turnout dataset. You can get this dataset by request.

use "county_turnout.dta"


The research question for this tutorial is: Are counties with more millennials likely to have lower turnout?

Let’s make a scatter plot.

scatter perc_turnout perc_mille


Optional:

1. make it prettier using grstyle.
2. add the best fitted line to the scatter plot.
grstyle init
grstyle set plain

scatter perc_turnout perc_mille

twoway (scatter perc_turnout perc_mille)(lfit perc_turnout perc_mille)


Is there a linear relationship?

### Correlation

Let’s look at the correlation between thses two variables.

pwcorr perc_turnout perc_mille

pwcorr perc_turnout perc_mille, sig


Interpret the result. The linear relationship between voter turnout and the percentage of millennials is statistically significant?

### Regressions

Let’s start with a simple regression. Our DV is voter turnout and our IV is the percentage of millennials.

reg perc_turnout perc_mille


What happens if we add a control variable, percent_college to the regression model? The effect of the percentage of millennials is still significant?

reg perc_turnout perc_mille perc_college


### Visualization using margins

Let’s visualize the predicted voter turnout. Look at our IV (perc_mille) and choose the range for the plot. I choose the range from 10 to 40 with 5 percent gap.

sum perc_mille, d
margins, at(perc_mille=(10(5)40)) atmeans

marginsplot, xtitle("Percentage of Millennials") ytitle("Voter Turnout") ///
ti(Effect of Millennials on Voter Turnout)

Previous