#delimit; clear; capture log close; log using lab6.log, replace; *********************************************************** LAB6.DO is a STATA do-file for Lab 6 that runs various limited dependent variable regressions on data from the March Current Population Survey. - written by Dan Rosenbaum, 2006 Located at my web-site for this course (http://www.uncg.edu/bae/people/rosenbaum) is a file containing a 1 in 100 subsample of all persons in the 1997 March CPS who are 25-54 and not in the armed forces. The file is named cps97.raw and is stored space-delimited ASCII. Here is a short description of the variables in the order that they are found in the data. AGE = age in years RACE = 1 if white = 2 if black = 3 if other FEMALE = 1 if female, 0 otherwise EDUCATT = 11 if high school dropout = 12 if high school graduate = 14 if some college = 16 if bachelors degree = 18 if masters degree or above EARN = annual earnings in nominal dollars WEEKS = total weeks worked last year HOURS = total hours worked last year NUMKID = number of children MARSTAT = 1 if married with spouse present = 2 if married but spouse absent = 3 if separated = 4 if divorced = 5 if never married = 6 if widowed WGT = March Supplement Weight YEAR = year (four digits) in July of previous year STATE = state of residence, alphabetical order (1-51) INSCHOOL = 1 if attending school, 0 otherwise UR = state unemployment rate (in percentage points) ***********************************************************; *********************************************************** I start by inputting the data using an INFILE statement, since the data is space-delimited rather than tab-delimited. I also calculate summary statistics for the sample. ***********************************************************; *********************************************************** Below I read in the data and restrict the sample to females 25-54 who are not in school. Then I create a series of variables that I use later in the program. ***********************************************************; infile age race female educatt earn weeks hours numkid marstat wgt year state inschool ur using cps97; drop if age<25 | age>54 | female==0 | inschool==1; gen hearn=earn/hours; replace hearn=4 if hearn>0 & hearn<4; gen lnhearn=log(hearn); gen work=hours>0; gen nonwhite=race>1; gen age3544=age>=35 & age<=44; gen age4554=age>=45 & age<=54; gen hsdrop=educatt==11; gen somecol=educatt==14; gen ba=educatt==16; gen ma=educatt==18; gen married=marstat==1 | marstat==2; gen separatd=marstat==3; gen divorced=marstat==4; gen nevmarry=marstat==5; gen widowed=marstat==6; gen kids=numkid>0; gen mkids=married*kids; sum; *********************************************************** Here I run a linear probability model (OLS) examining the relationship between marriage and race, age, educational attainment. Note the ROBUST option, which computes heteroskedasticity-corrected standard errors. This is necessary because linear probability models are ALWAYS heteroskedastic. ***********************************************************; reg married nonwhite age3544 age4554 hsdrop somecol ba ma, robust; predict pmarried; sum pmarried; *********************************************************** Here I run a probit and calculate average derivatives for the high school dropout coefficient. normd(x) gives the PDF evaluated at x. _b[X] gives the coefficient estimate for x. ***********************************************************; probit married nonwhite age3544 age4554 hsdrop somecol ba ma; predict xb_p, xb; gen d_hsd_p=normd(xb_p)*_b[hsdrop]; sum d_hsd_p; *********************************************************** Here I use DPROBIT to calculate marginal effects at the mean levels of the explanatory variables. ***********************************************************; dprobit married nonwhite age3544 age4554 hsdrop somecol ba ma; *********************************************************** Here I run a logit and again calculate average derivatives for the high school dropout coefficient. _b[X] gives the coefficient estimate for x. ***********************************************************; logit married nonwhite age3544 age4554 hsdrop somecol ba ma; predict xb_l, xb; gen d_hsd_l=exp(xb_l)/((1+exp(xb_l))^2)*_b[hsdrop]; sum d_hsd_l; *********************************************************** Here I run an OLS regression of hourly earnings on race, age, and educational attainment. ***********************************************************; reg hearn nonwhite age3544 age4554 hsdrop somecol ba ma; *********************************************************** Here I run a Tobit model assuming that hourly earnings levels above $30 are censored. I also calculate the marginal effect for the high school dropout coefficient. _b[X] gives the coefficient estimate for x. ***********************************************************; gen hearnt=hearn; replace hearnt=30 if hearn>=30 & hearn~=.; gen nocensor=hearnt<30 if hearnt~=.; probit nocensor nonwhite age3544 age4554 somecol ba ma; predict p_nocens if hearnt~=.; reg hearnt nonwhite age3544 age4554 hsdrop somecol ba ma; tobit hearnt nonwhite age3544 age4554 hsdrop somecol ba ma if hearnt~=., ul(30); gen d_hsd_t=p_nocens*_b[hsdrop] if hearnt~=.; sum hearnt nocensor p_nocens d_hsd_t; *********************************************************** Here I run a Heckman two-step selection model, where I use state unemployment rates and whether the woman has kids as additional regressors in the selection equation. ***********************************************************; gen hearn2=hearn; replace hearn2=. if hearn<4; heckman hearn2 nonwhite age3544 age4554 hsdrop somecol ba ma, select(nonwhite age3544 age4554 hsdrop somecol ba ma ur kids married mkids) two;