This workshop covers steps for conducting statistical analyses in
R
and preparing tables and figures.
It is assumed that you are familiar with R
(e.g., you
know about data types, data structures, and how to clean and create
variables). If this is not the case, you might want to check out the Intro
to R Workshop
Useful resources:
We need a few R packages which can be installed with the following command:
install.packages(c("haven", "stargazer", "margins", "prediction", "dotwhisker")
and the packages can be loaded with:
library(haven)
library(stargazer)
library(prediction)
library(margins)
library(dotwhisker)
library(MASS)
library(nnet)
. sysuse auto, cl(1978 Automobile Data)
. save auto.dta, replace
file auto.dta saved
. estpost summarize price mpg rep78 foreign, listwise
| e(count) e(sum_w) e(mean) e(Var) e(sd)
-------------+-------------------------------------------------------
price | 69 69 6146.043 8482308 2912.44
mpg | 69 69 21.28986 34.41475 5.866408
rep78 | 69 69 3.405797 .9799659 .9899323
foreign | 69 69 .3043478 .2148338 .4635016
| e(min) e(max) e(sum)
-------------+---------------------------------
price | 3291 15906 424077
mpg | 12 41 1469
rep78 | 1 5 235
foreign | 0 1 21
. esttab, cells("mean sd min max")
----------------------------------------------------------------
(1)
mean sd min max
----------------------------------------------------------------
price 6146.043 2912.44 3291 15906
mpg 21.28986 5.866408 12 41
rep78 3.405797 .9899323 1 5
foreign .3043478 .4635016 0 1
----------------------------------------------------------------
N 69
----------------------------------------------------------------
.
# load and summarize the data
<- read_dta('auto.dta')
autoData <- nrow(autoData)
N ## summary(autoData)
stargazer(data.frame(autoData), type='html', out = "descriptives.html",
summary.stat = c("mean", "sd", "min", "max"),
notes = paste("Number of observations =", N))
Statistic | Mean | St. Dev. | Min | Max |
price | 6,165.257 | 2,949.496 | 3,291 | 15,906 |
mpg | 21.297 | 5.786 | 12 | 41 |
rep78 | 3.406 | 0.990 | 1.000 | 5.000 |
headroom | 2.993 | 0.846 | 1.500 | 5.000 |
trunk | 13.757 | 4.277 | 5 | 23 |
weight | 3,019.459 | 777.194 | 1,760 | 4,840 |
length | 187.932 | 22.266 | 142 | 233 |
turn | 39.649 | 4.399 | 31 | 51 |
displacement | 197.297 | 91.837 | 79 | 425 |
gear_ratio | 3.015 | 0.456 | 2.190 | 3.890 |
foreign | 0.297 | 0.460 | 0 | 1 |
Number of observations = 74 |
# open the file descriptives.html, then copy and paste to excel
Here we do the following with the prestige data
prestige
: % of respondents who rated the occupation as
“good” or “excellent”educ
: % of workers in the occupation who had a HS
degreeincome
: % of workers in the occupation who earned more
than $3,500occ_type
: indicator for type of occupation, with values
blue collar (bc), professional and managerial (prof), and white collar
(wc)
. * load and save the d. use https://stats.idre.ucla.edu/stat/stata/examples/ara/duncan, clear
(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. save duncan.dta, replace
file duncan.dta saved
.
. * summarize the data and fit a linear model
. ds
occtitle income educ prestige occ_type
. regress prestige income educ i.occ_type
Source | SS df MS Number of obs = 45
-------------+---------------------------------- F(4, 40) = 105.03
Model | 39889.6897 4 9972.42242 Prob > F = 0.0000
Residual | 3797.95474 40 94.9488686 R-squared = 0.9131
-------------+---------------------------------- Adj R-squared = 0.9044
Total | 43687.6444 44 992.90101 Root MSE = 9.7442
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .5975465 .0893553 6.69 0.000 .4169527 .7781403
educ | .3453193 .1136089 3.04 0.004 .1157071 .5749315
|
occ_type |
prof | 16.65751 6.993007 2.38 0.022 2.52412 30.79091
wc | -14.66113 6.108774 -2.40 0.021 -27.00743 -2.314841
|
_cons | -.1850278 3.713771 -0.05 0.961 -7.690838 7.320782
------------------------------------------------------------------------------
.
# load and summarize the data
<- read_dta('duncan.dta')
duncan summary(duncan)
occtitle income educ prestige
Length:45 Min. : 7.00 Min. : 7.00 Min. : 3.00
Class :character 1st Qu.:21.00 1st Qu.: 26.00 1st Qu.:16.00
Mode :character Median :42.00 Median : 45.00 Median :41.00
Mean :41.87 Mean : 52.56 Mean :47.69
3rd Qu.:64.00 3rd Qu.: 84.00 3rd Qu.:81.00
Max. :81.00 Max. :100.00 Max. :97.00
occ_type
Min. :1.000
1st Qu.:1.000
Median :2.000
Mean :1.667
3rd Qu.:2.000
Max. :3.000
str(duncan$occ_type)
dbl+lbl [1:45] 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2,...
@ label : chr "Occupation type, Professional/Manag, White Collar, Blue Collar"
@ format.stata: chr "%8.0g"
@ labels : Named num [1:3] 1 2 3
..- attr(*, "names")= chr [1:3] "bc" "prof" "wc"
is.labelled(duncan$occ_type)
[1] TRUE
table(as_factor(duncan$occ_type), duncan$occ_type)
1 2 3
bc 21 0 0
prof 0 18 0
wc 0 0 6
# fit the model and print the results
<- lm(prestige ~ income + educ + factor(occ_type), data = duncan)
model1 model1
Call:
lm(formula = prestige ~ income + educ + factor(occ_type), data = duncan)
Coefficients:
(Intercept) income educ factor(occ_type)2
-0.1850 0.5975 0.3453 16.6575
factor(occ_type)3
-14.6611
# print even more results
summary(model1)
Call:
lm(formula = prestige ~ income + educ + factor(occ_type), data = duncan)
Residuals:
Min 1Q Median 3Q Max
-14.890 -5.740 -1.754 5.442 28.972
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.18503 3.71377 -0.050 0.96051
income 0.59755 0.08936 6.687 5.12e-08 ***
educ 0.34532 0.11361 3.040 0.00416 **
factor(occ_type)2 16.65751 6.99301 2.382 0.02206 *
factor(occ_type)3 -14.66113 6.10877 -2.400 0.02114 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.744 on 40 degrees of freedom
Multiple R-squared: 0.9131, Adjusted R-squared: 0.9044
F-statistic: 105 on 4 and 40 DF, p-value: < 2.2e-16
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. quietly: regress prestige income educ i.occ_type
.
. * display results
. ereturn list
scalars:
e(N) = 45
e(df_m) = 4
e(df_r) = 40
e(F) = 105.0293970832132
e(r2) = .9130657009984507
e(rmse) = 9.744171007333955
e(mss) = 39889.68969963773
e(rss) = 3797.954744806706
e(r2_a) = .9043722710982958
e(ll) = -163.6522324291697
e(ll_0) = -218.6107915932074
e(rank) = 5
macros:
e(cmdline) : "regress prestige income educ i.occ_type"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "prestige"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(model) : "ols"
e(estat_cmd) : "regress_estat"
matrices:
e(b) : 1 x 6
e(V) : 6 x 6
functions:
e(sample)
. display e(r2)
.9130657
. matrix list e(b)
e(b)[1,6]
1b. 2. 3.
income educ occ_type occ_type occ_type _cons
y1 .5975465 .34531933 0 16.657513 -14.661133 -.18502783
.
names(model1)
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "contrasts" "xlevels" "call" "terms"
[13] "model"
names(summary(model1))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
summary(model1)$r.squared
[1] 0.9130657
$coefficients model1
(Intercept) income educ factor(occ_type)2
-0.1850278 0.5975465 0.3453193 16.6575134
factor(occ_type)3
-14.6611334
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. quietly: regress prestige income
. eststo mod1
. quietly: regress prestige income educ
. eststo mod2
. quietly: regress prestige income educ i.occ_type
. eststo mod3
. quietly: regress prestige income educ i.occ_type c.income#c.educ
. eststo mod4
. estout mod1 mod2 mod3 mod4, cells(b(fmt(4) star) se(par)) stats(r2 N)
----------------------------------------------------------------------------
mod1 mod2 mod3 mod4
b/se b/se b/se b/se
----------------------------------------------------------------------------
income 1.0804*** 0.5987*** 0.5975*** 0.8520***
(0.1074) (0.1197) (0.0894) (0.1494)
educ 0.5458*** 0.3453** 0.6400***
(0.0983) (0.1136) (0.1788)
1.occ_type 0.0000 0.0000
(.) (.)
2.occ_type 16.6575* 13.6764
(6.9930) (6.8700)
3.occ_type -14.6611* -19.4339**
(6.1088) (6.3016)
c.income#c~c -0.0053*
(0.0025)
_cons 2.4566 -6.0647 -0.1850 -10.2146
(5.1901) (4.2719) (3.7138) (5.9971)
----------------------------------------------------------------------------
r2 0.7019 0.8282 0.9131 0.9218
N 45.0000 45.0000 45.0000 45.0000
----------------------------------------------------------------------------
. * syntax for writing results: using stataModels.csv
.
<- lm(prestige ~ income, data = duncan)
model1 <- lm(prestige ~ income + educ, data = duncan)
model2 <- update(model2, . ~ . + as_factor(occ_type), data = duncan)
model3 <- lm(prestige ~ income*educ + as_factor(occ_type), data = duncan)
model4 <- c("Income", "Education", "Professional", "White Collar",
cov_labels "Income * Education", "Intercept")
stargazer(model1, model2, model3, model4, type = "html", out = "models.html",
model.numbers = FALSE, dep.var.caption = "", dep.var.labels.include = FALSE,
column.labels = c("Model 1", "Model 2", "Model 3", "Model 4"),
star.cutoffs = c(0.05, 0.01, 0.001),
table.layout = "-c-!t-s-n",
covariate.labels = cov_labels, digits = 4, keep.stat = c("rsq", "n"))
Model 1 | Model 2 | Model 3 | Model 4 | |
Income | 1.0804*** | 0.5987*** | 0.5975*** | 0.8520*** |
(0.1074) | (0.1197) | (0.0894) | (0.1494) | |
Education | 0.5458*** | 0.3453** | 0.6400*** | |
(0.0983) | (0.1136) | (0.1788) | ||
Professional | 16.6575* | 13.6764 | ||
(6.9930) | (6.8700) | |||
White Collar | -14.6611* | -19.4339** | ||
(6.1088) | (6.3016) | |||
Income * Education | -0.0053* | |||
(0.0025) | ||||
Intercept | 2.4566 | -6.0647 | -0.1850 | -10.2146 |
(5.1901) | (4.2719) | (3.7138) | (5.9971) | |
Observations | 45 | 45 | 45 | 45 |
R2 | 0.7019 | 0.8282 | 0.9131 | 0.9218 |
Note: | p<0.05; p<0.01; p<0.001 |
## open the file models.html, then copy and paste into MS Excel
## check out option: style (journal-specific format)
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. codebook occ_type
-------------------------------------------------------------------------------
occ_type Occupation type, Professional/Manag, White Collar, Blue Collar
-------------------------------------------------------------------------------
type: numeric (byte)
label: occ_type
range: [1,3] units: 1
unique values: 3 missing .: 0/45
tabulation: Freq. Numeric Label
21 1 bc
18 2 prof
6 3 wc
. regress prestige income educ if occ_type == 1
Source | SS df MS Number of obs = 21
-------------+---------------------------------- F(2, 18) = 25.26
Model | 4806.89123 2 2403.44562 Prob > F = 0.0000
Residual | 1712.91829 18 95.1621272 R-squared = 0.7373
-------------+---------------------------------- Adj R-squared = 0.7081
Total | 6519.80952 20 325.990476 Root MSE = 9.7551
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .7834109 .1321951 5.93 0.000 .5056792 1.061143
educ | .3196229 .2829142 1.13 0.273 -.2747578 .9140035
_cons | -3.950543 6.869833 -0.58 0.572 -18.38353 10.48244
------------------------------------------------------------------------------
. regress prestige income educ if occ_type == 2
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(2, 15) = 9.09
Model | 1852.96055 2 926.480274 Prob > F = 0.0026
Residual | 1529.4839 15 101.965593 R-squared = 0.5478
-------------+---------------------------------- Adj R-squared = 0.4875
Total | 3382.44444 17 198.96732 Root MSE = 10.098
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .4142683 .1637481 2.53 0.023 .0652476 .7632891
educ | .3382139 .1589952 2.13 0.050 -.0006764 .6771043
_cons | 28.05726 12.94298 2.17 0.047 .4699643 55.64456
------------------------------------------------------------------------------
. regress prestige income educ if occ_type == 3
Source | SS df MS Number of obs = 6
-------------+---------------------------------- F(2, 3) = 8.14
Model | 587.090326 2 293.545163 Prob > F = 0.0614
Residual | 108.243008 3 36.0810025 R-squared = 0.8443
-------------+---------------------------------- Adj R-squared = 0.7405
Total | 695.333333 5 139.066667 Root MSE = 6.0067
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .4231025 .1396208 3.03 0.056 -.0212332 .8674382
educ | .4263938 .1431742 2.98 0.059 -.0292504 .882038
_cons | -10.99375 12.13372 -0.91 0.432 -49.60865 27.62116
------------------------------------------------------------------------------
.
<- lm(prestige ~ income + educ, data = duncan,
mod1_bc subset = occ_type == 1)
<- lm(prestige ~ income + educ, data = duncan,
mod1_prof subset = occ_type == 2)
<- lm(prestige ~ income + educ, data = duncan,
mod1_wc subset = occ_type == 3)
summary(mod1_bc)
Call:
lm(formula = prestige ~ income + educ, data = duncan, subset = occ_type ==
1)
Residuals:
Min 1Q Median 3Q Max
-18.2629 -6.3580 0.0742 5.1065 22.5198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.9505 6.8698 -0.575 0.572
income 0.7834 0.1322 5.926 1.31e-05 ***
educ 0.3196 0.2829 1.130 0.273
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.755 on 18 degrees of freedom
Multiple R-squared: 0.7373, Adjusted R-squared: 0.7081
F-statistic: 25.26 on 2 and 18 DF, p-value: 5.964e-06
summary(mod1_prof)
Call:
lm(formula = prestige ~ income + educ, data = duncan, subset = occ_type ==
2)
Residuals:
Min 1Q Median 3Q Max
-15.338 -5.216 -0.416 5.920 21.833
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.0573 12.9430 2.168 0.0467 *
income 0.4143 0.1637 2.530 0.0231 *
educ 0.3382 0.1590 2.127 0.0504 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.1 on 15 degrees of freedom
Multiple R-squared: 0.5478, Adjusted R-squared: 0.4875
F-statistic: 9.086 on 2 and 15 DF, p-value: 0.002599
summary(mod1_wc)
Call:
lm(formula = prestige ~ income + educ, data = duncan, subset = occ_type ==
3)
Residuals:
9 16 21 22 23 24
-2.450 2.341 7.023 1.233 -1.551 -6.596
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.9937 12.1337 -0.906 0.4317
income 0.4231 0.1396 3.030 0.0563 .
educ 0.4264 0.1432 2.978 0.0587 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.007 on 3 degrees of freedom
Multiple R-squared: 0.8443, Adjusted R-squared: 0.7405
F-statistic: 8.136 on 2 and 3 DF, p-value: 0.06142
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. regress prestige income educ if occ_type == 1 | occ_type == 3
Source | SS df MS Number of obs = 27
-------------+---------------------------------- F(2, 24) = 24.98
Model | 5483.40426 2 2741.70213 Prob > F = 0.0000
Residual | 2634.00315 24 109.750131 R-squared = 0.6755
-------------+---------------------------------- Adj R-squared = 0.6485
Total | 8117.40741 26 312.207977 Root MSE = 10.476
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
income | .6289111 .1140262 5.52 0.000 .3935726 .8642496
educ | .10142 .1283866 0.79 0.437 -.163557 .366397
_cons | 3.763145 4.288978 0.88 0.389 -5.088871 12.61516
------------------------------------------------------------------------------
.
$occBCWC <- duncan$occ_type==1 | duncan$occ_type==3
duncantable(duncan$occBCWC, duncan$occ_type)
1 2 3
FALSE 0 18 0
TRUE 21 0 6
<- lm(prestige ~ income + educ, data = duncan,
mod1_bcwc subset = occBCWC)
summary(mod1_bcwc)
Call:
lm(formula = prestige ~ income + educ, data = duncan, subset = occBCWC)
Residuals:
Min 1Q Median 3Q Max
-17.009 -6.064 -2.194 6.147 27.351
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7631 4.2890 0.877 0.389
income 0.6289 0.1140 5.515 1.13e-05 ***
educ 0.1014 0.1284 0.790 0.437
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.48 on 24 degrees of freedom
Multiple R-squared: 0.6755, Adjusted R-squared: 0.6485
F-statistic: 24.98 on 2 and 24 DF, p-value: 1.363e-06
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. regress prestige occ_type##c.educ
Source | SS df MS Number of obs = 45
-------------+---------------------------------- F(5, 39) = 36.59
Model | 36010.99 5 7202.198 Prob > F = 0.0000
Residual | 7676.65446 39 196.837294 R-squared = 0.8243
-------------+---------------------------------- Adj R-squared = 0.8018
Total | 43687.6444 44 992.90101 Root MSE = 14.03
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
occ_type |
prof | 42.66032 19.47572 2.19 0.035 3.26697 82.05367
wc | 16.21902 23.41483 0.69 0.493 -31.14193 63.57998
|
educ | 1.011168 .3706646 2.73 0.010 .2614281 1.760908
|
occ_type#|
c.educ |
prof | -.5115146 .4222935 -1.21 0.233 -1.365684 .3426546
wc | -.6322738 .4978774 -1.27 0.212 -1.639326 .3747783
|
_cons | -2.854351 9.876664 -0.29 0.774 -22.83179 17.12309
------------------------------------------------------------------------------
. margins occ_type
Predictive margins Number of obs = 45
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
occ_type |
bc | 50.28814 10.54456 4.77 0.000 28.95977 71.61652
prof | 66.06553 6.696261 9.87 0.000 52.52106 79.61
wc | 33.27767 6.45335 5.16 0.000 20.22454 46.3308
------------------------------------------------------------------------------
. marginsplot
Variables that uniquely identify margins: occ_type
. graph export "marginsPlot.svg", replace
(file marginsPlot.svg written in SVG format)
.
$occ_factor <- factor(duncan$occ_type, levels = 1:3, labels = c("bc", "prof", "wc"))
duncan<- lm(prestige ~ educ*occ_factor, data = duncan)
m1 summary(m1)
Call:
lm(formula = prestige ~ educ * occ_factor, data = duncan)
Residuals:
Min 1Q Median 3Q Max
-22.777 -9.369 0.229 6.726 41.542
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.8544 9.8767 -0.289 0.7741
educ 1.0112 0.3707 2.728 0.0095 **
occ_factorprof 42.6603 19.4757 2.190 0.0345 *
occ_factorwc 16.2190 23.4148 0.693 0.4926
educ:occ_factorprof -0.5115 0.4223 -1.211 0.2331
educ:occ_factorwc -0.6323 0.4979 -1.270 0.2116
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 14.03 on 39 degrees of freedom
Multiple R-squared: 0.8243, Adjusted R-squared: 0.8018
F-statistic: 36.59 on 5 and 39 DF, p-value: 1.015e-13
<- prediction(m1, at = list(occ_factor = c("bc", "prof", "wc")))
p1 summary(p1)
at(occ_factor) Prediction SE z p lower upper
bc 50.29 10.545 4.769 1.850e-06 29.62 70.96
prof 66.07 6.696 9.866 5.843e-23 52.94 79.19
wc 33.28 6.453 5.157 2.514e-07 20.63 45.93
<- summary(p1)[, 1:5]
yhat1 names(yhat1) <- c('term', 'estimate', 'std.error', 'statistic', 'p.value')
$term <- c('Blue Collar', 'Professional', 'White Collar')
yhat1dwplot(yhat1)
# Stata uses a t-distribution: qt(0.975, df = n - df)
cbind(yhat1[, 2] - qt(0.975, df = 39) * yhat1[, 3],
2] + qt(0.975, df = 39) * yhat1[, 3]) yhat1[,
[,1] [,2]
[1,] 28.95977 71.61652
[2,] 52.52106 79.61000
[3,] 20.22454 46.33080
# we could code this up ourselves with the CIs based on t-dist...
. use duncan, cl(From Fox, Applied Regression Analysis. Use 'notes' command for source of data
> )
. regress prestige occ_type##c.educ
Source | SS df MS Number of obs = 45
-------------+---------------------------------- F(5, 39) = 36.59
Model | 36010.99 5 7202.198 Prob > F = 0.0000
Residual | 7676.65446 39 196.837294 R-squared = 0.8243
-------------+---------------------------------- Adj R-squared = 0.8018
Total | 43687.6444 44 992.90101 Root MSE = 14.03
------------------------------------------------------------------------------
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
occ_type |
prof | 42.66032 19.47572 2.19 0.035 3.26697 82.05367
wc | 16.21902 23.41483 0.69 0.493 -31.14193 63.57998
|
educ | 1.011168 .3706646 2.73 0.010 .2614281 1.760908
|
occ_type#|
c.educ |
prof | -.5115146 .4222935 -1.21 0.233 -1.365684 .3426546
wc | -.6322738 .4978774 -1.27 0.212 -1.639326 .3747783
|
_cons | -2.854351 9.876664 -0.29 0.774 -22.83179 17.12309
------------------------------------------------------------------------------
. margins, dydx(*)
Average marginal effects Number of obs = 45
Model VCE : OLS
Expression : Linear prediction, predict()
dy/dx w.r.t. : 2.occ_type 3.occ_type educ
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
occ_type |
prof | 15.77739 12.4911 1.26 0.214 -9.488246 41.04302
wc | -17.01048 12.36258 -1.38 0.177 -42.01615 7.995202
|
educ | .722259 .1960499 3.68 0.001 .3257106 1.118807
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
. marginsplot
Variables that uniquely identify margins: _deriv
. graph export "marginsPlotEffects.svg", replace
(file marginsPlotEffects.svg written in SVG format)
.
<- lm(prestige ~ educ*factor(occ_type), data = duncan)
m1 <- margins(m1)
marg1 summary(marg1)
factor AME SE z p lower upper
educ 0.7223 0.1960 3.6841 0.0002 0.3380 1.1065
occ_type2 15.7774 12.4911 1.2631 0.2066 -8.7047 40.2595
occ_type3 -17.0105 12.3626 -1.3760 0.1688 -41.2407 7.2197
plot(marg1)
## use the following code to save the plot as a pdf file
## pdf('marg1_plot.pdf')
## plot(marg1)
## dev.off()
## alternative functions, to pdf(), include bmp(), jpeg(),
## png(), and tiff()
. use https://stats.idre.ucla.edu/stat/stata/dae/binary.dta, cl. save gradSchool.dta, replace
file gradSchool.dta saved
.
. su admit
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
admit | 400 .3175 .4660867 0 1
.
. logit admit gre gpa i.rank
Iteration 0: log likelihood = -249.98826
Iteration 1: log likelihood = -229.66446
Iteration 2: log likelihood = -229.25955
Iteration 3: log likelihood = -229.25875
Iteration 4: log likelihood = -229.25875
Logistic regression Number of obs = 400
LR chi2(5) = 41.46
Prob > chi2 = 0.0000
Log likelihood = -229.25875 Pseudo R2 = 0.0829
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gre | .0022644 .001094 2.07 0.038 .0001202 .0044086
gpa | .8040377 .3318193 2.42 0.015 .1536838 1.454392
|
rank |
2 | -.6754429 .3164897 -2.13 0.033 -1.295751 -.0551346
3 | -1.340204 .3453064 -3.88 0.000 -2.016992 -.6634158
4 | -1.551464 .4178316 -3.71 0.000 -2.370399 -.7325287
|
_cons | -3.989979 1.139951 -3.50 0.000 -6.224242 -1.755717
------------------------------------------------------------------------------
.
.
<- read_dta('gradSchool.dta')
gradSchool <- glm(admit ~ gre + gpa + factor(rank), data = gradSchool,
logit1 family = binomial(link = "logit"))
summary(logit1)
Call:
glm(formula = admit ~ gre + gpa + factor(rank), family = binomial(link = "logit"),
data = gradSchool)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6268 -0.8662 -0.6388 1.1490 2.0790
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979 1.139951 -3.500 0.000465 ***
gre 0.002264 0.001094 2.070 0.038465 *
gpa 0.804038 0.331819 2.423 0.015388 *
factor(rank)2 -0.675443 0.316490 -2.134 0.032829 *
factor(rank)3 -1.340204 0.345306 -3.881 0.000104 ***
factor(rank)4 -1.551464 0.417832 -3.713 0.000205 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 458.52 on 394 degrees of freedom
AIC: 470.52
Number of Fisher Scoring iterations: 4
. use gradSchool.dta, repl. probit admit gre gpa i.rank
Iteration 0: log likelihood = -249.98826
Iteration 1: log likelihood = -229.29667
Iteration 2: log likelihood = -229.20659
Iteration 3: log likelihood = -229.20658
Probit regression Number of obs = 400
LR chi2(5) = 41.56
Prob > chi2 = 0.0000
Log likelihood = -229.20658 Pseudo R2 = 0.0831
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gre | .0013756 .0006489 2.12 0.034 .0001038 .0026473
gpa | .4777302 .1954625 2.44 0.015 .0946308 .8608297
|
rank |
2 | -.4153992 .1953769 -2.13 0.033 -.7983308 -.0324675
3 | -.812138 .2085956 -3.89 0.000 -1.220978 -.4032981
4 | -.935899 .2456339 -3.81 0.000 -1.417333 -.4544654
|
_cons | -2.386838 .6740879 -3.54 0.000 -3.708026 -1.065649
------------------------------------------------------------------------------
.
<- read_dta('gradSchool.dta')
gradSchool <- glm(admit ~ gre + gpa + factor(rank), data = gradSchool,
probit1 family = binomial(link = "probit"))
summary(probit1)
Call:
glm(formula = admit ~ gre + gpa + factor(rank), family = binomial(link = "probit"),
data = gradSchool)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6163 -0.8710 -0.6389 1.1560 2.1035
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.386836 0.673946 -3.542 0.000398 ***
gre 0.001376 0.000650 2.116 0.034329 *
gpa 0.477730 0.197197 2.423 0.015410 *
factor(rank)2 -0.415399 0.194977 -2.131 0.033130 *
factor(rank)3 -0.812138 0.208358 -3.898 9.71e-05 ***
factor(rank)4 -0.935899 0.245272 -3.816 0.000136 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 458.41 on 394 degrees of freedom
AIC: 470.41
Number of Fisher Scoring iterations: 4
# change reference category
$f_rank <- factor(gradSchool$rank)
gradSchool$f_rank <- relevel(gradSchool$f_rank, ref = "3")
gradSchool<- glm(admit ~ gre + gpa + f_rank, data = gradSchool,
probit2 family = binomial(link = "probit"))
summary(probit2)
Call:
glm(formula = admit ~ gre + gpa + f_rank, family = binomial(link = "probit"),
data = gradSchool)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6163 -0.8710 -0.6389 1.1560 2.1035
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.198975 0.670818 -4.769 1.85e-06 ***
gre 0.001376 0.000650 2.116 0.0343 *
gpa 0.477730 0.197197 2.423 0.0154 *
f_rank1 0.812138 0.208358 3.898 9.71e-05 ***
f_rank2 0.396739 0.168054 2.361 0.0182 *
f_rank4 -0.123761 0.223955 -0.553 0.5805
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 458.41 on 394 degrees of freedom
AIC: 470.41
Number of Fisher Scoring iterations: 4
. use https://stats.idre.ucla.edu/stat/data/hsbdemo, cl(highschool and beyond (200 cases))
. save hsbdemo, replace
file hsbdemo.dta saved
. mlogit prog i.ses write, base(2)
Iteration 0: log likelihood = -204.09667
Iteration 1: log likelihood = -180.80105
Iteration 2: log likelihood = -179.98724
Iteration 3: log likelihood = -179.98173
Iteration 4: log likelihood = -179.98173
Multinomial logistic regression Number of obs = 200
LR chi2(6) = 48.23
Prob > chi2 = 0.0000
Log likelihood = -179.98173 Pseudo R2 = 0.1182
------------------------------------------------------------------------------
prog | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
general |
ses |
middle | -.533291 .4437321 -1.20 0.229 -1.40299 .336408
high | -1.162832 .5142195 -2.26 0.024 -2.170684 -.1549804
|
write | -.0579284 .0214109 -2.71 0.007 -.0998931 -.0159637
_cons | 2.852186 1.166439 2.45 0.014 .5660075 5.138365
-------------+----------------------------------------------------------------
academic | (base outcome)
-------------+----------------------------------------------------------------
vocation |
ses |
middle | .2913931 .4763737 0.61 0.541 -.6422822 1.225068
high | -.9826703 .5955669 -1.65 0.099 -2.14996 .1846195
|
write | -.1136026 .0222199 -5.11 0.000 -.1571528 -.0700524
_cons | 5.2182 1.163549 4.48 0.000 2.937686 7.498714
------------------------------------------------------------------------------
.
<- read_dta('hsbdemo.dta')
hsbdemo summary(hsbdemo)
id female ses schtyp
Min. : 1.00 Min. :0.000 Min. :1.000 Min. :1.00
1st Qu.: 50.75 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.00
Median :100.50 Median :1.000 Median :2.000 Median :1.00
Mean :100.50 Mean :0.545 Mean :2.055 Mean :1.16
3rd Qu.:150.25 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.00
Max. :200.00 Max. :1.000 Max. :3.000 Max. :2.00
prog read write math
Min. :1.000 Min. :28.00 Min. :31.00 Min. :33.00
1st Qu.:2.000 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00
Median :2.000 Median :50.00 Median :54.00 Median :52.00
Mean :2.025 Mean :52.23 Mean :52.77 Mean :52.65
3rd Qu.:2.250 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00
Max. :3.000 Max. :76.00 Max. :67.00 Max. :75.00
science socst honors awards cid
Min. :26.00 Min. :26.00 Min. :0.000 Min. :0.00 Min. : 1.00
1st Qu.:44.00 1st Qu.:46.00 1st Qu.:0.000 1st Qu.:0.00 1st Qu.: 5.00
Median :53.00 Median :52.00 Median :0.000 Median :1.00 Median :10.50
Mean :51.85 Mean :52.41 Mean :0.265 Mean :1.67 Mean :10.43
3rd Qu.:58.00 3rd Qu.:61.00 3rd Qu.:1.000 3rd Qu.:2.00 3rd Qu.:15.00
Max. :74.00 Max. :71.00 Max. :1.000 Max. :7.00 Max. :20.00
$f_prog <- factor( hsbdemo$prog)
hsbdemo$f_prog <- relevel(hsbdemo$f_prog, ref = "2")
hsbdemo<- multinom(f_prog ~ factor(ses) + write, data = hsbdemo) mnlog1
# weights: 15 (8 variable)
initial value 219.722458
iter 10 value 179.982880
final value 179.981726
converged
summary(mnlog1)
Call:
multinom(formula = f_prog ~ factor(ses) + write, data = hsbdemo)
Coefficients:
(Intercept) factor(ses)2 factor(ses)3 write
1 2.852198 -0.5332810 -1.1628226 -0.0579287
3 5.218260 0.2913859 -0.9826649 -0.1136037
Std. Errors:
(Intercept) factor(ses)2 factor(ses)3 write
1 1.166441 0.4437323 0.5142196 0.02141097
3 1.163552 0.4763739 0.5955665 0.02221996
Residual Deviance: 359.9635
AIC: 375.9635
. use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, cl. save schoolAbsence, replace
file schoolAbsence.dta saved
. poisson daysabs math i.prog
Iteration 0: log likelihood = -1328.6751
Iteration 1: log likelihood = -1328.6425
Iteration 2: log likelihood = -1328.6425
Poisson regression Number of obs = 314
LR chi2(3) = 443.73
Prob > chi2 = 0.0000
Log likelihood = -1328.6425 Pseudo R2 = 0.1431
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.0068084 .0009311 -7.31 0.000 -.0086332 -.0049835
|
prog |
2 | -.4398975 .056672 -7.76 0.000 -.5509725 -.3288224
3 | -1.281364 .0778898 -16.45 0.000 -1.434025 -1.128703
|
_cons | 2.651974 .0607367 43.66 0.000 2.532932 2.771015
------------------------------------------------------------------------------
.
.
<- read_dta('schoolAbsence.dta')
schoolAbsence <- glm(daysabs ~ math + factor(prog), data = schoolAbsence,
pois1 family=poisson(link = "log"))
summary(pois1)
Call:
glm(formula = daysabs ~ math + factor(prog), family = poisson(link = "log"),
data = schoolAbsence)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.2597 -2.2038 -0.9193 0.6511 7.4233
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.651974 0.060736 43.664 < 2e-16 ***
math -0.006808 0.000931 -7.313 2.62e-13 ***
factor(prog)2 -0.439897 0.056672 -7.762 8.35e-15 ***
factor(prog)3 -1.281364 0.077886 -16.452 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2217.7 on 313 degrees of freedom
Residual deviance: 1774.0 on 310 degrees of freedom
AIC: 2665.3
Number of Fisher Scoring iterations: 5
. use schoolAbsence, repl. nbreg daysabs math i.prog
Fitting Poisson model:
Iteration 0: log likelihood = -1328.6751
Iteration 1: log likelihood = -1328.6425
Iteration 2: log likelihood = -1328.6425
Fitting constant-only model:
Iteration 0: log likelihood = -899.27009
Iteration 1: log likelihood = -896.47264
Iteration 2: log likelihood = -896.47237
Iteration 3: log likelihood = -896.47237
Fitting full model:
Iteration 0: log likelihood = -870.49809
Iteration 1: log likelihood = -865.90381
Iteration 2: log likelihood = -865.62942
Iteration 3: log likelihood = -865.6289
Iteration 4: log likelihood = -865.6289
Negative binomial regression Number of obs = 314
LR chi2(3) = 61.69
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -865.6289 Pseudo R2 = 0.0344
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.005993 .0025072 -2.39 0.017 -.010907 -.001079
|
prog |
2 | -.44076 .182576 -2.41 0.016 -.7986025 -.0829175
3 | -1.278651 .2019811 -6.33 0.000 -1.674526 -.882775
|
_cons | 2.615265 .1963519 13.32 0.000 2.230423 3.000108
-------------+----------------------------------------------------------------
/lnalpha | -.0321895 .1027882 -.2336506 .1692717
-------------+----------------------------------------------------------------
alpha | .9683231 .0995322 .7916384 1.184442
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 926.03 Prob >= chibar2 = 0.000
.
<- read_dta('schoolAbsence.dta')
schoolAbsence <- glm.nb(daysabs ~ math + factor(prog), data = schoolAbsence)
nbreg1 summary(nbreg1)
Call:
glm.nb(formula = daysabs ~ math + factor(prog), data = schoolAbsence,
init.theta = 1.032713156, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1547 -1.0192 -0.3694 0.2285 2.5273
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.615265 0.197460 13.245 < 2e-16 ***
math -0.005993 0.002505 -2.392 0.0167 *
factor(prog)2 -0.440760 0.182610 -2.414 0.0158 *
factor(prog)3 -1.278651 0.200720 -6.370 1.89e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(1.0327) family taken to be 1)
Null deviance: 427.54 on 313 degrees of freedom
Residual deviance: 358.52 on 310 degrees of freedom
AIC: 1741.3
Number of Fisher Scoring iterations: 1
Theta: 1.033
Std. Err.: 0.106
2 x log-likelihood: -1731.258
# Stata parameterization is 1 / theta