clear capture log close log using lab2.log, replace /********************************************************** LAB2.DO is a STATA do-file that inputs data from the 1997 March CPS, runs a series of regressions focusing on the interpretation of qualitative independent variables. - written by Dan Rosenbaum, 2006 Located at my website course (http://www.uncg.edu/bae/people/rosenbaum/Eco643/main.html) is a file containing a 1 in 100 subsample of all persons in the 1997 March CPS who are 25-54 and not in the armed forces. The file is named cps97.raw and is stored space-delimited ASCII. Here is a short description of the variables in the order that they are found in the data. AGE = age in years RACE = 1 if white = 2 if black = 3 if other FEMALE = 1 if female, 0 otherwise EDUCATT = 11 if high school dropout = 12 if high school graduate = 14 if some college = 16 if bachelors degree = 18 if masters degree or above EARN = annual earnings in nominal dollars WEEKS = total weeks worked last year HOURS = total hours worked last year NUMKID = number of children MS = 1 if married with spouse present = 2 if married but spouse absent = 3 if separated = 4 if divorced = 5 if never married = 6 if widowed WGT = March Supplement Weight YEAR = year (four digits) in July of previous year STATE = state of residence, alphabetical order (1-51) INSCHOOL = 1 if attending school, 0 otherwise UR = state unemployment rate (in percentage points) **********************************************************/ /********************************************************** I start by inputting the data using an INFILE statement, since the data is space-delimited rather than tab-delimited. I also calculate summary statistics for the sample. **********************************************************/ infile age race female educatt earn weeks hours numkid ms /// wgt year state inschool ur using cps97 sum /********************************************************** Here I create a variable, giving the log of hourly earnings. Note that it is undefined for those who have not worked during the last year. Also, I restrict the sample to those with hourly earnings between $4 and $100. **********************************************************/ gen hearn=earn/hours replace hearn=. if hearn>100 | hearn<4 gen lnhearn=log(hearn) sum hearn lnhearn /********************************************************** Here I regress log hourly earnings onto a female indicator and educational attainment. **********************************************************/ reg lnhearn female educatt /********************************************************** Here I regress log hourly earnings onto male and female indicators and education, leaving out the intercept term. (The first REG statement incorrectly includes both indicators and the intercept.) **********************************************************/ gen male=1-female reg lnhearn male female educatt reg lnhearn male female educatt, noc /********************************************************** Here I regress log hourly earnings onto three separate age category indicators and educational attainment. **********************************************************/ gen a2534=age<=34 gen a3544=age>=35 & age<=44 gen a4554=age>=45 & age<=54 reg lnhearn a2534 a3544 a4554 educatt, noc reg lnhearn a3544 a4554 educatt /********************************************************** Here I regress log hourly earnings onto a female indicator and a married indicator, female*married interaction, and educational attainment. **********************************************************/ gen married=ms==1 | ms==2 gen fmarried=female*married reg lnhearn female married fmarried educatt /********************************************************** Here I regress log hourly earnings on a female indicator age and educational attainment, allowing for an effect of age on hourly earnings that differs for males and females. **********************************************************/ gen fage=female*age gen mage=male*age reg lnhearn female age fage educatt reg lnhearn female mage fage educatt /********************************************************** Here I regress log hourly earnings onto a nonlinear function of age that allows for different slopes for different age groups. The intercept is assumed to be the same for all groups. I graph the predicted values against age. **********************************************************/ gen age2534=age*a2534 gen age3544=age*a3544 gen age4554=age*a4554 reg lnhearn age2534 age3544 age4554 predict lnhehat1 **graph twoway scatter lnhehat1 age /********************************************************** Here I regress log hourly earnings onto a nonlinear function that allows the slope and intercept to differ by age group. I graph the predicted values against age. *********************************************************/ reg lnhearn a3544 a4554 age2534 age3544 age4554 predict lnhehat2 **graph twoway scatter lnhehat2 age /********************************************************** Here I regress log hourly earnings onto a spline function that allows the slope and intercept to differ by age group, but requires that the age/earnings relationship is continuous. I graph the predicted values against age. **********************************************************/ gen age3544s=(age>=35)*(age-35) gen age4554s=(age>=45)*(age-45) reg lnhearn age age3544s age4554s predict lnhehat3 **graph twoway scatter lnhehat3 age