------------------------------------------------------------------------------- log: /afs/uncg.edu/html/bae/people/rosenbaum/643/lab2.log log type: text opened on: 8 Sep 2006, 00:08:58 . . /********************************************************** > LAB2.DO is a STATA do-file that inputs data from > the 1997 March CPS, runs a series of regressions focusing > on the interpretation of qualitative independent > variables. > - written by Dan Rosenbaum, 2006 > > Located at my website course > (http://www.uncg.edu/bae/people/rosenbaum/Eco643/main.html) > is a file containing a 1 in 100 subsample of all persons > in the 1997 March CPS who are 25-54 and not in the armed > forces. The file is named cps97.raw and is stored > space-delimited ASCII. Here is a short description of > the variables in the order that they are found in the data. > > AGE = age in years > RACE = 1 if white > = 2 if black > = 3 if other > FEMALE = 1 if female, 0 otherwise > EDUCATT = 11 if high school dropout > = 12 if high school graduate > = 14 if some college > = 16 if bachelors degree > = 18 if masters degree or above > EARN = annual earnings in nominal dollars > WEEKS = total weeks worked last year > HOURS = total hours worked last year > NUMKID = number of children > MS = 1 if married with spouse present > = 2 if married but spouse absent > = 3 if separated > = 4 if divorced > = 5 if never married > = 6 if widowed > WGT = March Supplement Weight > YEAR = year (four digits) in July of previous year > STATE = state of residence, alphabetical order (1-51) > INSCHOOL = 1 if attending school, 0 otherwise > UR = state unemployment rate (in percentage points) > **********************************************************/ . . /********************************************************** > I start by inputting the data using an INFILE statement, > since the data is space-delimited rather than tab-delimited. > I also calculate summary statistics for the sample. > **********************************************************/ . . infile age race female educatt earn weeks hours numkid ms /// > wgt year state inschool ur using cps97 (560 observations read) . sum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 560 38.64286 8.130485 25 54 race | 560 1.194643 .5210831 1 3 female | 560 .55 .4979385 0 1 educatt | 560 13.73393 2.089823 11 18 earn | 560 26247.52 28289.89 0 171998.5 -------------+-------------------------------------------------------- weeks | 560 39.97321 20.04729 0 52 hours | 560 1663.879 963.6095 0 3952 numkid | 560 .9535714 1.195394 0 8 ms | 560 2.282143 1.705207 1 6 wgt | 560 2090.138 1010.363 222.61 6623.81 -------------+-------------------------------------------------------- year | 560 1996 0 1996 1996 state | 560 25.45893 14.30758 1 51 inschool | 560 .0125 .1112018 0 1 ur | 560 5.426429 1.131552 2.9 8.5 . . /********************************************************** > Here I create a variable, giving the log of hourly > earnings. Note that it is undefined for those who have > not worked during the last year. Also, I restrict the > sample to those with hourly earnings between $4 and > $100. > **********************************************************/ . . gen hearn=earn/hours (94 missing values generated) . replace hearn=. if hearn>100 | hearn<4 (24 real changes made, 24 to missing) . gen lnhearn=log(hearn) (118 missing values generated) . sum hearn lnhearn Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hearn | 442 15.70464 11.45807 4 83.33334 lnhearn | 442 2.569971 .5818134 1.386294 4.422849 . . /********************************************************** > Here I regress log hourly earnings onto a female indicator > and educational attainment. > **********************************************************/ . . reg lnhearn female educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 2, 439) = 60.54 Model | 32.273535 2 16.1367675 Prob > F = 0.0000 Residual | 117.007975 439 .266532973 R-squared = 0.2162 -------------+------------------------------ Adj R-squared = 0.2126 Total | 149.28151 441 .338506826 Root MSE = .51627 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -.2811702 .0491349 -5.72 0.000 -.377739 -.1846014 educatt | .1106007 .0116777 9.47 0.000 .0876496 .1335519 _cons | 1.173719 .1661211 7.07 0.000 .8472276 1.500211 ------------------------------------------------------------------------------ . . /********************************************************** > Here I regress log hourly earnings onto male and female > indicators and education, leaving out the intercept term. > (The first REG statement incorrectly includes both > indicators and the intercept.) > **********************************************************/ . . gen male=1-female . reg lnhearn male female educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 2, 439) = 60.54 Model | 32.273535 2 16.1367675 Prob > F = 0.0000 Residual | 117.007975 439 .266532973 R-squared = 0.2162 -------------+------------------------------ Adj R-squared = 0.2126 Total | 149.28151 441 .338506826 Root MSE = .51627 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | .2811702 .0491349 5.72 0.000 .1846014 .377739 female | (dropped) educatt | .1106007 .0116777 9.47 0.000 .0876496 .1335519 _cons | .8925488 .1665365 5.36 0.000 .565241 1.219857 ------------------------------------------------------------------------------ . reg lnhearn male female educatt, noc Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 439) = 3691.32 Model | 2951.57242 3 983.857474 Prob > F = 0.0000 Residual | 117.007975 439 .266532973 R-squared = 0.9619 -------------+------------------------------ Adj R-squared = 0.9616 Total | 3068.5804 442 6.94248959 Root MSE = .51627 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- male | 1.173719 .1661211 7.07 0.000 .8472276 1.500211 female | .8925488 .1665365 5.36 0.000 .565241 1.219857 educatt | .1106007 .0116777 9.47 0.000 .0876496 .1335519 ------------------------------------------------------------------------------ . . /********************************************************** > Here I regress log hourly earnings onto three separate > age category indicators and educational attainment. > **********************************************************/ . . gen a2534=age<=34 . gen a3544=age>=35 & age<=44 . gen a4554=age>=45 & age<=54 . reg lnhearn a2534 a3544 a4554 educatt, noc Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 4, 438) = 2651.19 Model | 2946.86839 4 736.717097 Prob > F = 0.0000 Residual | 121.71201 438 .277881302 R-squared = 0.9603 -------------+------------------------------ Adj R-squared = 0.9600 Total | 3068.5804 442 6.94248959 Root MSE = .52714 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- a2534 | .9472934 .1698924 5.58 0.000 .6133877 1.281199 a3544 | 1.135424 .1716452 6.61 0.000 .7980732 1.472774 a4554 | 1.162465 .1750178 6.64 0.000 .8184859 1.506444 educatt | .1071665 .0119436 8.97 0.000 .0836925 .1306404 ------------------------------------------------------------------------------ . reg lnhearn a3544 a4554 educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 33.07 Model | 27.5694998 3 9.18983327 Prob > F = 0.0000 Residual | 121.71201 438 .277881302 R-squared = 0.1847 -------------+------------------------------ Adj R-squared = 0.1791 Total | 149.28151 441 .338506826 Root MSE = .52714 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- a3544 | .1881303 .0587702 3.20 0.001 .0726236 .303637 a4554 | .2151715 .0644468 3.34 0.001 .0885082 .3418348 educatt | .1071665 .0119436 8.97 0.000 .0836925 .1306404 _cons | .9472934 .1698924 5.58 0.000 .6133877 1.281199 ------------------------------------------------------------------------------ . . /********************************************************** > Here I regress log hourly earnings onto a female indicator > and a married indicator, female*married interaction, > and educational attainment. > **********************************************************/ . . gen married=ms==1 | ms==2 . gen fmarried=female*married . reg lnhearn female married fmarried educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 4, 437) = 33.29 Model | 34.8622754 4 8.71556884 Prob > F = 0.0000 Residual | 114.419235 437 .261828913 R-squared = 0.2335 -------------+------------------------------ Adj R-squared = 0.2265 Total | 149.28151 441 .338506826 Root MSE = .51169 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -.1382657 .0815453 -1.70 0.091 -.2985355 .022004 married | .2328217 .0744724 3.13 0.002 .0864532 .3791902 fmarried | -.2094284 .1018193 -2.06 0.040 -.4095448 -.009312 educatt | .1105331 .0115751 9.55 0.000 .0877833 .1332828 _cons | 1.017641 .1721252 5.91 0.000 .6793446 1.355937 ------------------------------------------------------------------------------ . . /********************************************************** > Here I regress log hourly earnings on a female indicator > age and educational attainment, allowing for an effect of > age on hourly earnings that differs for males and females. > **********************************************************/ . . gen fage=female*age . gen mage=male*age . reg lnhearn female age fage educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 4, 437) = 40.91 Model | 40.6689091 4 10.1672273 Prob > F = 0.0000 Residual | 108.612601 437 .248541421 R-squared = 0.2724 -------------+------------------------------ Adj R-squared = 0.2658 Total | 149.28151 441 .338506826 Root MSE = .49854 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .7230892 .2347466 3.08 0.002 .2617166 1.184462 age | .0243453 .004201 5.80 0.000 .0160887 .0326019 fage | -.0261514 .0059386 -4.40 0.000 -.0378232 -.0144796 educatt | .1071541 .0113019 9.48 0.000 .0849412 .129367 _cons | .2881348 .2213232 1.30 0.194 -.1468554 .723125 ------------------------------------------------------------------------------ . reg lnhearn female mage fage educatt Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 4, 437) = 40.91 Model | 40.6689091 4 10.1672273 Prob > F = 0.0000 Residual | 108.612601 437 .248541421 R-squared = 0.2724 -------------+------------------------------ Adj R-squared = 0.2658 Total | 149.28151 441 .338506826 Root MSE = .49854 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | .7230892 .2347466 3.08 0.002 .2617166 1.184462 mage | .0243453 .004201 5.80 0.000 .0160887 .0326019 fage | -.0018061 .0042062 -0.43 0.668 -.010073 .0064608 educatt | .1071541 .0113019 9.48 0.000 .0849412 .129367 _cons | .2881348 .2213232 1.30 0.194 -.1468554 .723125 ------------------------------------------------------------------------------ . . /********************************************************** > Here I regress log hourly earnings onto a nonlinear > function of age that allows for different slopes for > different age groups. The intercept is assumed to be the > same for all groups. I graph the predicted values against > age. > **********************************************************/ . . gen age2534=age*a2534 . gen age3544=age*a3544 . gen age4554=age*a4554 . reg lnhearn age2534 age3544 age4554 Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 5.24 Model | 5.1754241 3 1.72514137 Prob > F = 0.0015 Residual | 144.106086 438 .329009329 R-squared = 0.0347 -------------+------------------------------ Adj R-squared = 0.0281 Total | 149.28151 441 .338506826 Root MSE = .57359 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age2534 | .0051448 .0118763 0.43 0.665 -.0181968 .0284863 age3544 | .0091287 .0090423 1.01 0.313 -.0086429 .0269004 age4554 | .0080569 .0073007 1.10 0.270 -.006292 .0224057 _cons | 2.272206 .3555961 6.39 0.000 1.573319 2.971092 ------------------------------------------------------------------------------ . predict lnhehat1 (option xb assumed; fitted values) . **graph twoway scatter lnhehat1 age . . /********************************************************** > Here I regress log hourly earnings onto a nonlinear > function that allows the slope and intercept to differ > by age group. I graph the predicted values against age. > *********************************************************/ . . reg lnhearn a3544 a4554 age2534 age3544 age4554 Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 5, 436) = 3.65 Model | 5.99412488 5 1.19882498 Prob > F = 0.0030 Residual | 143.287385 436 .328640792 R-squared = 0.0402 -------------+------------------------------ Adj R-squared = 0.0291 Total | 149.28151 441 .338506826 Root MSE = .57327 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- a3544 | .2623057 .7878976 0.33 0.739 -1.286244 1.810855 a4554 | 1.742162 1.10543 1.58 0.116 -.4304725 3.914796 age2534 | .0152957 .0159495 0.96 0.338 -.0160517 .0466431 age3544 | .0102444 .015818 0.65 0.518 -.0208446 .0413333 age4554 | -.0211004 .0202582 -1.04 0.298 -.0609164 .0187155 _cons | 1.965688 .4793683 4.10 0.000 1.023528 2.907848 ------------------------------------------------------------------------------ . predict lnhehat2 (option xb assumed; fitted values) . **graph twoway scatter lnhehat2 age . . /********************************************************** > Here I regress log hourly earnings onto a spline function > that allows the slope and intercept to differ by age group, > but requires that the age/earnings relationship is > continuous. I graph the predicted values against age. > **********************************************************/ . . gen age3544s=(age>=35)*(age-35) . gen age4554s=(age>=45)*(age-45) . reg lnhearn age age3544s age4554s Source | SS df MS Number of obs = 442 -------------+------------------------------ F( 3, 438) = 5.90 Model | 5.80142985 3 1.93380995 Prob > F = 0.0006 Residual | 143.48008 438 .327580092 R-squared = 0.0389 -------------+------------------------------ Adj R-squared = 0.0323 Total | 149.28151 441 .338506826 Root MSE = .57235 ------------------------------------------------------------------------------ lnhearn | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0215098 .0118198 1.82 0.069 -.0017209 .0447404 age3544s | -.0024112 .0196672 -0.12 0.902 -.0410651 .0362426 age4554s | -.035939 .0236776 -1.52 0.130 -.0824749 .0105969 _cons | 1.790044 .3716014 4.82 0.000 1.0597 2.520387 ------------------------------------------------------------------------------ . predict lnhehat3 (option xb assumed; fitted values) . **graph twoway scatter lnhehat3 age . . . . . . . . . . end of do-file