# Regressions

For this correlation and regression practice, I use a county-level voter turnout dataset. You can get this dataset by request.

Load the dataset first.

```
use "county_turnout.dta"
```

The research question for this tutorial is: Are counties with more millennials likely to have lower turnout?

Let’s make a scatter plot.

```
scatter perc_turnout perc_mille
```

Optional:

- make it prettier using
`grstyle`

. - add the best fitted line to the scatter plot.

```
grstyle init
grstyle set plain
scatter perc_turnout perc_mille
twoway (scatter perc_turnout perc_mille)(lfit perc_turnout perc_mille)
```

Is there a linear relationship?

### Correlation

Let’s look at the correlation between thses two variables.

```
pwcorr perc_turnout perc_mille
pwcorr perc_turnout perc_mille, sig
```

Interpret the result. The linear relationship between voter turnout and the percentage of millennials is statistically significant?

### Regressions

Let’s start with a simple regression. Our DV is voter turnout and our IV is the percentage of millennials.

```
reg perc_turnout perc_mille
```

What happens if we add a control variable, percent_college to the regression model? The effect of the percentage of millennials is still significant?

```
reg perc_turnout perc_mille perc_college
```

### Visualization using `margins`

Let’s visualize the predicted voter turnout. Look at our IV (perc_mille) and choose the range for the plot. I choose the range from 10 to 40 with 5 percent gap.

```
sum perc_mille, d
margins, at(perc_mille=(10(5)40)) atmeans
marginsplot, xtitle("Percentage of Millennials") ytitle("Voter Turnout") ///
ti(Effect of Millennials on Voter Turnout)
```