------------------------------------------------------------------------------- log: /afs/uncg.edu/html/bae/people/rosenbaum/643/lab1.log log type: text opened on: 23 Aug 2006, 22:13:04 . *********************************************************** > LAB1.DO is a STATA do-file that inputs data from > the 1997 March CPS, runs regressions, and computes > test statistics. It will be used for the first two labs. > - written by Dan Rosenbaum, 2006 > > Located at my course web-site > (http://www.uncg.edu/bae/people/rosenbaum/Eco643/main.html) > is a file containing a 1 in 100 subsample of all persons > in the 1997 March CPS who are 25-54 and not in the armed > forces. The file is named cps97.raw and is stored > space-delimited ASCII. Here is a short description of > the variables in the order that they are found in the data. > > AGE = age in years > RACE = 1 if white > = 2 if black > = 3 if other > FEMALE = 1 if female, 0 otherwise > EDUCATT = 11 if high school dropout > = 12 if high school graduate > = 14 if some college > = 16 if bachelors degree > = 18 if masters degree or above > EARN = annual earnings in nominal dollars > WEEKS = total weeks worked last year > HOURS = total hours worked last year > NUMKID = number of children > MS = 1 if married with spouse present > = 2 if married but spouse absent > = 3 if separated > = 4 if divorced > = 5 if never married > = 6 if widowed > WGT = March Supplement Weight > YEAR = year (four digits) in July of previous year > STATE = state of residence, alphabetical order (1-51) > INSCHOOL = 1 if attending school, 0 otherwise > UR = state unemployment rate (in percentage points) > ***********************************************************; . *********************************************************** > I start by inputting the data using an INFILE statement, > since the data is space-delimited rather than tab-delimited. > I also calculate summary statistics for the sample. > ***********************************************************; . infile age race female educatt earn weeks hours numkid ms > wgt year state inschool ur using cps97; (560 observations read) . sum; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 560 38.64286 8.130485 25 54 race | 560 1.194643 .5210831 1 3 female | 560 .55 .4979385 0 1 educatt | 560 13.73393 2.089823 11 18 earn | 560 26247.52 28289.89 0 171998.5 -------------+-------------------------------------------------------- weeks | 560 39.97321 20.04729 0 52 hours | 560 1663.879 963.6095 0 3952 numkid | 560 .9535714 1.195394 0 8 ms | 560 2.282143 1.705207 1 6 wgt | 560 2090.138 1010.363 222.61 6623.81 -------------+-------------------------------------------------------- year | 560 1996 0 1996 1996 state | 560 25.45893 14.30758 1 51 inschool | 560 .0125 .1112018 0 1 ur | 560 5.426429 1.131552 2.9 8.5 . *********************************************************** > Here I create a variable, giving the log of hourly > earnings. Note that it is undefined for those who have > not worked during the last year. Also, I restrict the > sample to those with hourly earnings between $4 and > $100. > ***********************************************************; . gen hearn=earn/hours if earn>0 & hours>0; (95 missing values generated) . replace hearn=. if hearn>100 | hearn<4; (23 real changes made, 23 to missing) . gen lnhearn=log(hearn); (118 missing values generated) . *********************************************************** > Here I regress hourly earnings onto education in a number > of different ways. We will talk about how to interpret > the education coefficient in these different cases. > ***********************************************************; . reg hearn educatt; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 80.20 Model | 8926.38432 1 8926.38432 Prob > F = 0.0000 Residual | 48971.3611 440 111.298548 R-squared = 0.1542 -------------+------------------------------ Adj R-squared = 0.1523 Total | 57897.7454 441 131.287405 Root MSE = 10.55 ------------------------------------------------------------------------------ hearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | 2.136898 .2386111 8.96 0.000 1.667939 2.605857 _cons | -14.06206 3.361486 -4.18 0.000 -20.66862 -7.455493 ------------------------------------------------------------------------------ . /* level level model */ > > gen hearnc=hearn*100; (118 missing values generated) . /* hourly earnings in cents per hour */ > reg hearnc educatt; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 80.20 Model | 89263843.2 1 89263843.2 Prob > F = 0.0000 Residual | 489713617 440 1112985.49 R-squared = 0.1542 -------------+------------------------------ Adj R-squared = 0.1523 Total | 578977460 441 1312874.06 Root MSE = 1055 ------------------------------------------------------------------------------ hearnc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | 213.6898 23.86111 8.96 0.000 166.7939 260.5857 _cons | -1406.206 336.1487 -4.18 0.000 -2066.862 -745.5493 ------------------------------------------------------------------------------ . gen eddays=educatt*180; . /* education in days */ > reg hearn eddays; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 80.20 Model | 8926.38432 1 8926.38432 Prob > F = 0.0000 Residual | 48971.3611 440 111.298548 R-squared = 0.1542 -------------+------------------------------ Adj R-squared = 0.1523 Total | 57897.7454 441 131.287405 Root MSE = 10.55 ------------------------------------------------------------------------------ hearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- eddays | .0118717 .0013256 8.96 0.000 .0092663 .014477 _cons | -14.06206 3.361486 -4.18 0.000 -20.66862 -7.455493 ------------------------------------------------------------------------------ . reg lnhearn educatt; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 82.40 Model | 23.5456264 1 23.5456264 Prob > F = 0.0000 Residual | 125.735884 440 .285763372 R-squared = 0.1577 -------------+------------------------------ Adj R-squared = 0.1558 Total | 149.28151 441 .338506826 Root MSE = .53457 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | .1097491 .0120906 9.08 0.000 .0859866 .1335117 _cons | 1.04118 .1703295 6.11 0.000 .7064195 1.375941 ------------------------------------------------------------------------------ . /* log level model */ > > gen lnhearnc=log(hearnc); (118 missing values generated) . reg lnhearnc educatt; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 82.40 Model | 23.5456289 1 23.5456289 Prob > F = 0.0000 Residual | 125.735892 440 .285763392 R-squared = 0.1577 -------------+------------------------------ Adj R-squared = 0.1558 Total | 149.281521 441 .338506851 Root MSE = .53457 ------------------------------------------------------------------------------ lnhearnc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | .1097491 .0120906 9.08 0.000 .0859866 .1335117 _cons | 5.64635 .1703295 33.15 0.000 5.31159 5.981111 ------------------------------------------------------------------------------ . /* will educatt coefficient change? */ > > gen lneduc=log(educatt); . reg lnhearn lneduc; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 440) = 79.66 Model | 22.8845704 1 22.8845704 Prob > F = 0.0000 Residual | 126.39694 440 .287265772 R-squared = 0.1533 -------------+------------------------------ Adj R-squared = 0.1514 Total | 149.28151 441 .338506826 Root MSE = .53597 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lneduc | 1.533835 .1718499 8.93 0.000 1.196086 1.871584 _cons | -1.453145 .4514676 -3.22 0.001 -2.340446 -.5658443 ------------------------------------------------------------------------------ . /* log log model */ > > *********************************************************** > Here I generate age squared, which will make it possible > for me to test whether age has a non-linear > effect on log hourly earnings. > ***********************************************************; . gen age2=age*age; . reg lnhearn educatt age age2; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 32.90 Model | 27.4532197 3 9.15107324 Prob > F = 0.0000 Residual | 121.828291 438 .278146782 R-squared = 0.1839 -------------+------------------------------ Adj R-squared = 0.1783 Total | 149.28151 441 .338506826 Root MSE = .5274 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | .1063965 .0119619 8.89 0.000 .0828866 .1299064 age | .0648905 .0317905 2.04 0.042 .0024097 .1273713 age2 | -.0006957 .0004044 -1.72 0.086 -.0014905 .0000991 _cons | -.3371967 .6204354 -0.54 0.587 -1.556597 .8822037 ------------------------------------------------------------------------------ . *********************************************************** > Here I do partial regression in order to demonstrate how > variation is used in a multivariate OLS setting. > PREDICT with the RESID option creates a variable giving > the residual values. In the first PREDICT statement, this > variable is named EHATE. > ***********************************************************; . reg lnhearn age age2; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 2, 439) = 8.31 Model | 5.44792716 2 2.72396358 Prob > F = 0.0003 Residual | 143.833583 439 .327639141 R-squared = 0.0365 -------------+------------------------------ Adj R-squared = 0.0321 Total | 149.28151 441 .338506826 Root MSE = .5724 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0765955 .0344735 2.22 0.027 .0088418 .1443491 age2 | -.0008217 .0004387 -1.87 0.062 -.0016838 .0000404 _cons | .8886548 .6565522 1.35 0.177 -.4017213 2.179031 ------------------------------------------------------------------------------ . predict ehathe,resid; (118 missing values generated) . reg educatt age age2 if lnhearn~=.; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 2, 439) = 1.23 Model | 10.9319035 2 5.46595176 Prob > F = 0.2920 Residual | 1943.89389 439 4.4280043 R-squared = 0.0056 -------------+------------------------------ Adj R-squared = 0.0011 Total | 1954.82579 441 4.43271155 Root MSE = 2.1043 ------------------------------------------------------------------------------ educatt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .1100126 .1267335 0.87 0.386 -.1390673 .3590925 age2 | -.0011844 .0016126 -0.73 0.463 -.0043538 .001985 _cons | 11.52154 2.413656 4.77 0.000 6.777784 16.2653 ------------------------------------------------------------------------------ . predict ehated,resid; . reg ehathe ehated, noc; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 1, 441) = 79.66 Model | 22.0052924 1 22.0052924 Prob > F = 0.0000 Residual | 121.82829 441 .276254626 R-squared = 0.1530 -------------+------------------------------ Adj R-squared = 0.1511 Total | 143.833582 442 .325415344 Root MSE = .5256 ------------------------------------------------------------------------------ ehathe | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ehated | .1063965 .0119212 8.93 0.000 .0829671 .1298258 ------------------------------------------------------------------------------ . reg ehathe ehated age age2; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 26.37 Model | 22.0052924 3 7.33509746 Prob > F = 0.0000 Residual | 121.82829 438 .278146781 R-squared = 0.1530 -------------+------------------------------ Adj R-squared = 0.1472 Total | 143.833582 441 .326153248 Root MSE = .5274 ------------------------------------------------------------------------------ ehathe | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ehated | .1063965 .0119619 8.89 0.000 .0828866 .1299064 age | -1.06e-10 .0317632 -0.00 1.000 -.0624273 .0624273 age2 | -5.67e-13 .0004042 -0.00 1.000 -.0007943 .0007943 _cons | 5.70e-09 .6049345 0.00 1.000 -1.188935 1.188935 ------------------------------------------------------------------------------ . reg lnhearn educatt age age2; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 32.90 Model | 27.4532197 3 9.15107324 Prob > F = 0.0000 Residual | 121.828291 438 .278146782 R-squared = 0.1839 -------------+------------------------------ Adj R-squared = 0.1783 Total | 149.28151 441 .338506826 Root MSE = .5274 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | .1063965 .0119619 8.89 0.000 .0828866 .1299064 age | .0648905 .0317905 2.04 0.042 .0024097 .1273713 age2 | -.0006957 .0004044 -1.72 0.086 -.0014905 .0000991 _cons | -.3371967 .6204354 -0.54 0.587 -1.556597 .8822037 ------------------------------------------------------------------------------ . *********************************************************** > Here I perform a variety of tests. > ***********************************************************; . test educatt; ( 1) educatt = 0 F( 1, 438) = 79.11 Prob > F = 0.0000 . /* a simple t-test */ > > test age age2; ( 1) age = 0 ( 2) age2 = 0 F( 2, 438) = 7.02 Prob > F = 0.0010 . /* testing the joint significance of the > age coefficients */ > test age=0; ( 1) age = 0 F( 1, 438) = 4.17 Prob > F = 0.0418 . test age2=0,a; ( 1) age = 0 ( 2) age2 = 0 F( 2, 438) = 7.02 Prob > F = 0.0010 . test educatt age age2; ( 1) educatt = 0 ( 2) age = 0 ( 3) age2 = 0 F( 3, 438) = 32.90 Prob > F = 0.0000 . /* testing overall significance */ > > gen lnh_ed=lnhearn-0.1*educatt; (118 missing values generated) . reg lnh_ed; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 0, 441) = 0.00 Model | 0 0 . Prob > F = . Residual | 125.921682 441 .285536693 R-squared = 0.0000 -------------+------------------------------ Adj R-squared = 0.0000 Total | 125.921682 441 .285536693 Root MSE = .53436 ------------------------------------------------------------------------------ lnh_ed | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 1.176984 .0254167 46.31 0.000 1.127031 1.226937 ------------------------------------------------------------------------------ . /* a test with three restrictions */ > > reg lnhearn educatt age age2; Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 32.90 Model | 27.4532197 3 9.15107324 Prob > F = 0.0000 Residual | 121.828291 438 .278146782 R-squared = 0.1839 -------------+------------------------------ Adj R-squared = 0.1783 Total | 149.28151 441 .338506826 Root MSE = .5274 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educatt | .1063965 .0119619 8.89 0.000 .0828866 .1299064 age | .0648905 .0317905 2.04 0.042 .0024097 .1273713 age2 | -.0006957 .0004044 -1.72 0.086 -.0014905 .0000991 _cons | -.3371967 .6204354 -0.54 0.587 -1.556597 .8822037 ------------------------------------------------------------------------------ . test educatt=0.1; ( 1) educatt = .1 F( 1, 438) = 0.29 Prob > F = 0.5931 . test age age2, ac; ( 1) educatt = .1 ( 2) age = 0 ( 3) age2 = 0 F( 3, 438) = 4.91 Prob > F = 0.0023 . *********************************************************** > Here I test whether weekly earnings increase with age > at various age levels: 25, 40, and 55. > Note that with a quadratic age term, the marginal effect > of age is given by b[age] + 2*age2*b[age2]. > ***********************************************************; . gen d25=_b[age]+2*25*_b[age2]; . sum d25; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d25 | 560 .030106 0 .030106 .030106 . test age+50*age2=0; ( 1) age + 50 age2 = 0 F( 1, 438) = 6.47 Prob > F = 0.0113 . gen d40=_b[age]+2*40*_b[age2]; . sum d40; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d40 | 560 .0092352 0 .0092352 .0092352 . test age+80*age2=0; ( 1) age + 80 age2 = 0 F( 1, 438) = 8.20 Prob > F = 0.0044 . gen d55=_b[age]+2*55*_b[age2]; . sum d55; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d55 | 560 -.0116355 0 -.0116355 -.0116355 . test age+110*age2=0; ( 1) age + 110 age2 = 0 F( 1, 438) = 0.77 Prob > F = 0.3796 . end of do-file