------------------------------------------------------------------------------- log: /afs/uncg.edu/html/bae/people/rosenbaum/643/lab6.log log type: text opened on: 23 Aug 2006, 22:26:51 . *********************************************************** > LAB6.DO is a STATA do-file for Lab 6 that runs various > limited dependent variable regressions on data from the > March Current Population Survey. > - written by Dan Rosenbaum, 2006 > > Located at my web-site for this course > (http://www.uncg.edu/bae/people/rosenbaum) > is a file containing a 1 in 100 subsample of all persons > in the 1997 March CPS who are 25-54 and not in the armed > forces. The file is named cps97.raw and is stored > space-delimited ASCII. Here is a short description of > the variables in the order that they are found in the data. > > AGE = age in years > RACE = 1 if white > = 2 if black > = 3 if other > FEMALE = 1 if female, 0 otherwise > EDUCATT = 11 if high school dropout > = 12 if high school graduate > = 14 if some college > = 16 if bachelors degree > = 18 if masters degree or above > EARN = annual earnings in nominal dollars > WEEKS = total weeks worked last year > HOURS = total hours worked last year > NUMKID = number of children > MARSTAT = 1 if married with spouse present > = 2 if married but spouse absent > = 3 if separated > = 4 if divorced > = 5 if never married > = 6 if widowed > WGT = March Supplement Weight > YEAR = year (four digits) in July of previous year > STATE = state of residence, alphabetical order (1-51) > INSCHOOL = 1 if attending school, 0 otherwise > UR = state unemployment rate (in percentage points) > ***********************************************************; . *********************************************************** > I start by inputting the data using an INFILE statement, > since the data is space-delimited rather than tab-delimited. > I also calculate summary statistics for the sample. > ***********************************************************; . *********************************************************** > Below I read in the data and restrict the sample to females > 25-54 who are not in school. Then I create a series of > variables that I use later in the program. > ***********************************************************; . infile age race female educatt earn weeks hours numkid marstat > wgt year state inschool ur using cps97; (560 observations read) . drop if age<25 | age>54 | female==0 | inschool==1; (258 observations deleted) . gen hearn=earn/hours; (64 missing values generated) . replace hearn=4 if hearn>0 & hearn<4; (12 real changes made) . gen lnhearn=log(hearn); (64 missing values generated) . gen work=hours>0; . gen nonwhite=race>1; . gen age3544=age>=35 & age<=44; . gen age4554=age>=45 & age<=54; . gen hsdrop=educatt==11; . gen somecol=educatt==14; . gen ba=educatt==16; . gen ma=educatt==18; . gen married=marstat==1 | marstat==2; . gen separatd=marstat==3; . gen divorced=marstat==4; . gen nevmarry=marstat==5; . gen widowed=marstat==6; . gen kids=numkid>0; . gen mkids=married*kids; . sum; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 302 38.86424 8.013161 25 54 race | 302 1.221854 .5651442 1 3 female | 302 1 0 1 1 educatt | 302 13.72185 2.046499 11 18 earn | 302 19311.93 23022.93 0 171998.5 -------------+-------------------------------------------------------- weeks | 302 36.95695 21.54085 0 52 hours | 302 1420.255 944.9039 0 3900 numkid | 302 1.099338 1.171392 0 6 marstat | 302 2.245033 1.652409 1 6 wgt | 302 1958.841 953.0698 222.61 5711.49 -------------+-------------------------------------------------------- year | 302 1996 0 1996 1996 state | 302 26.11258 14.12547 1 51 inschool | 302 0 0 0 0 ur | 302 5.433444 1.146962 2.9 8.5 hearn | 238 15.34198 33.46079 4 500.025 -------------+-------------------------------------------------------- lnhearn | 238 2.399553 .6474229 1.386294 6.214658 work | 302 .7880795 .4093471 0 1 nonwhite | 302 .1490066 .356686 0 1 age3544 | 302 .4072848 .4921442 0 1 age4554 | 302 .2649007 .4420127 0 1 -------------+-------------------------------------------------------- hsdrop | 302 .1192053 .3245677 0 1 somecol | 302 .2748344 .4471718 0 1 ba | 302 .2284768 .4205482 0 1 ma | 302 .0629139 .243211 0 1 married | 302 .6225166 .4855619 0 1 -------------+-------------------------------------------------------- separatd | 302 .0562914 .2308661 0 1 divorced | 302 .1788079 .3838274 0 1 nevmarry | 302 .1258278 .3322057 0 1 widowed | 302 .0165563 .1278134 0 1 kids | 302 .589404 .4927585 0 1 -------------+-------------------------------------------------------- mkids | 302 .4271523 .4954858 0 1 . *********************************************************** > Here I run a linear probability model (OLS) examining > the relationship between marriage and race, age, educational > attainment. Note the ROBUST option, which computes > heteroskedasticity-corrected standard errors. This is > necessary because linear probability models are ALWAYS > heteroskedastic. > ***********************************************************; . reg married nonwhite age3544 age4554 hsdrop somecol ba ma, robust; Regression with robust standard errors Number of obs = 302 F( 7, 294) = 1.49 Prob > F = 0.1707 R-squared = 0.0363 Root MSE = .48231 ------------------------------------------------------------------------------ | Robust married | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | -.0059242 .0793765 -0.07 0.941 -.1621424 .150294 age3544 | .084056 .0669264 1.26 0.210 -.0476595 .2157715 age4554 | .1083767 .0746957 1.45 0.148 -.0386293 .2553828 hsdrop | -.2570551 .0975095 -2.64 0.009 -.4489602 -.0651501 somecol | -.0213123 .0724545 -0.29 0.769 -.1639075 .1212828 ba | -.0732936 .0777178 -0.94 0.346 -.2262473 .0796602 ma | -.0061247 .1209275 -0.05 0.960 -.2441179 .2318685 _cons | .6140864 .0702384 8.74 0.000 .4758527 .7523201 ------------------------------------------------------------------------------ . predict pmarried; (option xb assumed; fitted values) . sum pmarried; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- pmarried | 302 .6225166 .092481 .3511071 .7224631 . *********************************************************** > Here I run a probit and calculate average derivatives for > the high school dropout coefficient. > normd(x) gives the PDF evaluated at x. > _b[X] gives the coefficient estimate for x. > ***********************************************************; . probit married nonwhite age3544 age4554 hsdrop somecol ba ma; Iteration 0: log likelihood = -200.17125 Iteration 1: log likelihood = -194.78939 Iteration 2: log likelihood = -194.78848 Probit estimates Number of obs = 302 LR chi2(7) = 10.77 Prob > chi2 = 0.1492 Log likelihood = -194.78848 Pseudo R2 = 0.0269 ------------------------------------------------------------------------------ married | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | -.0191053 .2087043 -0.09 0.927 -.4281582 .3899477 age3544 | .225139 .1779158 1.27 0.206 -.1235696 .5738476 age4554 | .2865838 .1983079 1.45 0.148 -.1020926 .6752602 hsdrop | -.6628425 .2508056 -2.64 0.008 -1.154412 -.1712725 somecol | -.0554251 .2005589 -0.28 0.782 -.4485132 .3376631 ba | -.1955266 .2058057 -0.95 0.342 -.5988984 .2078452 ma | -.0201625 .3277007 -0.06 0.951 -.6624441 .6221191 _cons | .2939451 .186951 1.57 0.116 -.0724721 .6603622 ------------------------------------------------------------------------------ . predict xb_p, xb; . gen d_hsd_p=normd(xb_p)*_b[hsdrop]; . sum d_hsd_p; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d_hsd_p | 302 -.2445645 .0129444 -.2636055 -.2234286 . *********************************************************** > Here I use DPROBIT to calculate marginal effects at the > mean levels of the explanatory variables. > ***********************************************************; . dprobit married nonwhite age3544 age4554 hsdrop somecol ba ma; Iteration 0: log likelihood = -200.17125 Iteration 1: log likelihood = -194.78939 Iteration 2: log likelihood = -194.78848 Probit estimates Number of obs = 302 LR chi2(7) = 10.77 Prob > chi2 = 0.1492 Log likelihood = -194.78848 Pseudo R2 = 0.0269 ------------------------------------------------------------------------------ married | dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ] ---------+-------------------------------------------------------------------- nonwhite*| -.0072601 .079472 -0.09 0.927 .149007 -.163022 .148502 age3544*| .0846324 .0661418 1.27 0.206 .407285 -.045003 .214268 age4554*| .1058185 .0709068 1.45 0.148 .264901 -.033156 .244793 hsdrop*| -.2591272 .0960886 -2.64 0.008 .119205 -.447457 -.070797 somecol*| -.0210973 .0766156 -0.28 0.782 .274834 -.171261 .129066 ba*| -.0751925 .0800126 -0.95 0.342 .228477 -.232014 .081629 ma*| -.0076669 .1249456 -0.06 0.951 .062914 -.252556 .237222 ---------+-------------------------------------------------------------------- obs. P | .6225166 pred. P | .6249552 (at x-bar) ------------------------------------------------------------------------------ (*) dF/dx is for discrete change of dummy variable from 0 to 1 z and P>|z| are the test of the underlying coefficient being 0 . *********************************************************** > Here I run a logit and again calculate average derivatives > for the high school dropout coefficient. > _b[X] gives the coefficient estimate for x. > ***********************************************************; . logit married nonwhite age3544 age4554 hsdrop somecol ba ma; Iteration 0: log likelihood = -200.17125 Iteration 1: log likelihood = -194.78581 Iteration 2: log likelihood = -194.7764 Iteration 3: log likelihood = -194.7764 Logit estimates Number of obs = 302 LR chi2(7) = 10.79 Prob > chi2 = 0.1481 Log likelihood = -194.7764 Pseudo R2 = 0.0270 ------------------------------------------------------------------------------ married | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | -.0281248 .3408778 -0.08 0.934 -.6962332 .6399835 age3544 | .3607255 .2877856 1.25 0.210 -.203324 .924775 age4554 | .4705345 .3233698 1.46 0.146 -.1632587 1.104328 hsdrop | -1.073874 .4071293 -2.64 0.008 -1.871833 -.2759156 somecol | -.0988672 .3270198 -0.30 0.762 -.7398143 .5420799 ba | -.3242755 .3359438 -0.97 0.334 -.9827132 .3341622 ma | -.0274623 .5428643 -0.05 0.960 -1.091457 1.036532 _cons | .4766931 .3037447 1.57 0.117 -.1186355 1.072022 ------------------------------------------------------------------------------ . predict xb_l, xb; . gen d_hsd_l=exp(xb_l)/((1+exp(xb_l))^2)*_b[hsdrop]; . sum d_hsd_l; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d_hsd_l | 302 -.2431928 .016399 -.2674344 -.2162294 . *********************************************************** > Here I run an OLS regression of hourly earnings on > race, age, and educational attainment. > ***********************************************************; . reg hearn nonwhite age3544 age4554 hsdrop somecol ba ma; Source | SS df MS Number of obs = 238 -------------+------------------------------ F( 7, 230) = 1.80 Model | 13797.1264 7 1971.01805 Prob > F = 0.0878 Residual | 251553.8 230 1093.71217 R-squared = 0.0520 -------------+------------------------------ Adj R-squared = 0.0231 Total | 265350.926 237 1119.62416 Root MSE = 33.071 ------------------------------------------------------------------------------ hearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | 4.685858 6.249023 0.75 0.454 -7.626791 16.99851 age3544 | -5.277055 5.204341 -1.01 0.312 -15.53133 4.977224 age4554 | -9.22538 5.766432 -1.60 0.111 -20.58716 2.136403 hsdrop | -4.176518 8.867551 -0.47 0.638 -21.64854 13.2955 somecol | .232389 5.623177 0.04 0.967 -10.84713 11.31191 ba | 13.96815 5.800704 2.41 0.017 2.53884 25.39746 ma | 17.09051 8.855975 1.93 0.055 -.3586957 34.53972 _cons | 15.02839 5.335862 2.82 0.005 4.514972 25.54181 ------------------------------------------------------------------------------ . *********************************************************** > Here I run a Tobit model assuming that hourly earnings > levels above $30 are censored. I also > calculate the marginal effect for the high school > dropout coefficient. > _b[X] gives the coefficient estimate for x. > ***********************************************************; . gen hearnt=hearn; (64 missing values generated) . replace hearnt=30 if hearn>=30 & hearn~=.; (12 real changes made) . gen nocensor=hearnt<30 if hearnt~=.; (64 missing values generated) . probit nocensor nonwhite age3544 age4554 somecol ba ma; Iteration 0: log likelihood = -47.540631 Iteration 1: log likelihood = -37.412101 Iteration 2: log likelihood = -36.586909 Iteration 3: log likelihood = -36.557573 Iteration 4: log likelihood = -36.557493 Iteration 5: log likelihood = -36.557493 Probit estimates Number of obs = 238 LR chi2(6) = 21.97 Prob > chi2 = 0.0012 Log likelihood = -36.557493 Pseudo R2 = 0.2310 ------------------------------------------------------------------------------ nocensor | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | -.8423677 .3967697 -2.12 0.034 -1.620022 -.0647133 age3544 | -.2156846 .3945009 -0.55 0.585 -.9888921 .5575229 age4554 | .1429229 .4601751 0.31 0.756 -.7590037 1.04485 somecol | -.0241524 .5830111 -0.04 0.967 -1.166833 1.118528 ba | -1.199136 .4501916 -2.66 0.008 -2.081495 -.3167764 ma | -1.55703 .5171199 -3.01 0.003 -2.570566 -.5434934 _cons | 2.579958 .5077419 5.08 0.000 1.584802 3.575114 ------------------------------------------------------------------------------ . predict p_nocens if hearnt~=.; (option p assumed; Pr(nocensor)) (64 missing values generated) . reg hearnt nonwhite age3544 age4554 hsdrop somecol ba ma; Source | SS df MS Number of obs = 238 -------------+------------------------------ F( 7, 230) = 7.60 Model | 2107.38624 7 301.055177 Prob > F = 0.0000 Residual | 9108.61154 230 39.6026589 R-squared = 0.1879 -------------+------------------------------ Adj R-squared = 0.1632 Total | 11215.9978 237 47.3248852 Root MSE = 6.2931 ------------------------------------------------------------------------------ hearnt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | 2.311415 1.189112 1.94 0.053 -.0315299 4.654361 age3544 | 1.679692 .990322 1.70 0.091 -.2715712 3.630955 age4554 | -.1666344 1.097281 -0.15 0.879 -2.328642 1.995373 hsdrop | -3.196323 1.687386 -1.89 0.059 -6.521033 .1283862 somecol | 1.704046 1.070021 1.59 0.113 -.404251 3.812342 ba | 5.236283 1.103802 4.74 0.000 3.061427 7.41114 ma | 7.245846 1.685183 4.30 0.000 3.925477 10.56621 _cons | 9.347556 1.015349 9.21 0.000 7.346982 11.34813 ------------------------------------------------------------------------------ . tobit hearnt nonwhite age3544 age4554 hsdrop somecol ba ma if hearnt~=., ul(3 > 0); Tobit estimates Number of obs = 238 LR chi2(7) = 50.27 Prob > chi2 = 0.0000 Log likelihood = -760.13027 Pseudo R2 = 0.0320 ------------------------------------------------------------------------------ hearnt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nonwhite | 2.556907 1.233739 2.07 0.039 .1260875 4.987726 age3544 | 1.736634 1.023125 1.70 0.091 -.2792147 3.752482 age4554 | -.2355487 1.132094 -0.21 0.835 -2.466099 1.995002 hsdrop | -3.216842 1.738259 -1.85 0.066 -6.64171 .2080261 somecol | 1.709356 1.102675 1.55 0.122 -.4632307 3.881942 ba | 5.496144 1.141849 4.81 0.000 3.246375 7.745914 ma | 7.861011 1.760888 4.46 0.000 4.391557 11.33046 _cons | 9.324924 1.047129 8.91 0.000 7.26178 11.38807 -------------+---------------------------------------------------------------- _se | 6.482194 .3101585 (Ancillary parameter) ------------------------------------------------------------------------------ Obs. summary: 226 uncensored observations 12 right-censored observations at hearnt>=30 . gen d_hsd_t=p_nocens*_b[hsdrop] if hearnt~=.; (64 missing values generated) . sum hearnt nocensor p_nocens d_hsd_t; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hearnt | 238 12.33731 6.879308 4 30 nocensor | 238 .9495798 .2192715 0 1 p_nocens | 238 .949423 .0777544 .4859904 .9967642 d_hsd_t | 238 -3.054144 .2501237 -3.206433 -1.563354 . *********************************************************** > Here I run a Heckman two-step selection model, where I use > state unemployment rates and whether the woman has kids > as additional regressors in the selection equation. > ***********************************************************; . gen hearn2=hearn; (64 missing values generated) . replace hearn2=. if hearn<4; (0 real changes made) . heckman hearn2 nonwhite age3544 age4554 hsdrop somecol ba ma, > select(nonwhite age3544 age4554 hsdrop somecol ba ma ur kids married > mkids) two; Heckman selection model -- two-step estimates Number of obs = 302 (regression model with sample selection) Censored obs = 64 Uncensored obs = 238 Wald chi2(14) = 36.05 Prob > chi2 = 0.0010 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hearn2 | nonwhite | 2.745514 6.68239 0.41 0.681 -10.35173 15.84276 age3544 | -5.170534 5.424399 -0.95 0.340 -15.80216 5.461094 age4554 | -7.732561 6.141906 -1.26 0.208 -19.77048 4.305355 hsdrop | -17.08646 13.74441 -1.24 0.214 -44.02501 9.852092 somecol | -.1152703 5.895579 -0.02 0.984 -11.67039 11.43985 ba | 13.7862 6.088794 2.26 0.024 1.85238 25.72001 ma | 19.18898 9.512108 2.02 0.044 .5455935 37.83237 _cons | 7.979607 7.997605 1.00 0.318 -7.695409 23.65462 -------------+---------------------------------------------------------------- select | nonwhite | -.1890948 .2342915 -0.81 0.420 -.6482976 .270108 age3544 | .0712014 .2031654 0.35 0.726 -.3269954 .4693983 age4554 | .2255098 .245733 0.92 0.359 -.256118 .7071376 hsdrop | -1.155057 .2772432 -4.17 0.000 -1.698444 -.6116705 somecol | -.0459749 .2332228 -0.20 0.844 -.5030832 .4111334 ba | -.094151 .2482057 -0.38 0.704 -.5806253 .3923233 ma | .3142624 .4272916 0.74 0.462 -.5232137 1.151738 ur | -.1355538 .0753939 -1.80 0.072 -.2833232 .0122156 kids | -.1599931 .3091709 -0.52 0.605 -.7659569 .4459706 married | -.6773144 .2989611 -2.27 0.023 -1.263267 -.0913613 mkids | .2491036 .3873227 0.64 0.520 -.510035 1.008242 _cons | 2.121845 .4961256 4.28 0.000 1.149457 3.094233 -------------+---------------------------------------------------------------- mills | lambda | 23.94615 19.36133 1.24 0.216 -14.00135 61.89366 -------------+---------------------------------------------------------------- rho | 0.67307 sigma | 35.577696 lambda | 23.94615 19.36133 ------------------------------------------------------------------------------ . end of do-file