#delimit; clear; capture log close; log using lab1.log,replace; *********************************************************** LAB1.DO is a STATA do-file that inputs data from the 1997 March CPS, runs regressions, and computes test statistics. It will be used for the first two labs. - written by Dan Rosenbaum, 2006 Located at my course web-site (http://www.uncg.edu/bae/people/rosenbaum/Eco643/main.html) is a file containing a 1 in 100 subsample of all persons in the 1997 March CPS who are 25-54 and not in the armed forces. The file is named cps97.raw and is stored space-delimited ASCII. Here is a short description of the variables in the order that they are found in the data. AGE = age in years RACE = 1 if white = 2 if black = 3 if other FEMALE = 1 if female, 0 otherwise EDUCATT = 11 if high school dropout = 12 if high school graduate = 14 if some college = 16 if bachelors degree = 18 if masters degree or above EARN = annual earnings in nominal dollars WEEKS = total weeks worked last year HOURS = total hours worked last year NUMKID = number of children MS = 1 if married with spouse present = 2 if married but spouse absent = 3 if separated = 4 if divorced = 5 if never married = 6 if widowed WGT = March Supplement Weight YEAR = year (four digits) in July of previous year STATE = state of residence, alphabetical order (1-51) INSCHOOL = 1 if attending school, 0 otherwise UR = state unemployment rate (in percentage points) ***********************************************************; *********************************************************** I start by inputting the data using an INFILE statement, since the data is space-delimited rather than tab-delimited. I also calculate summary statistics for the sample. ***********************************************************; infile age race female educatt earn weeks hours numkid ms wgt year state inschool ur using cps97; sum; *********************************************************** Here I create a variable, giving the log of hourly earnings. Note that it is undefined for those who have not worked during the last year. Also, I restrict the sample to those with hourly earnings between $4 and $100. ***********************************************************; gen hearn=earn/hours if earn>0 & hours>0; replace hearn=. if hearn>100 | hearn<4; gen lnhearn=log(hearn); *********************************************************** Here I regress hourly earnings onto education in a number of different ways. We will talk about how to interpret the education coefficient in these different cases. ***********************************************************; reg hearn educatt; /* level level model */ gen hearnc=hearn*100; /* hourly earnings in cents per hour */ reg hearnc educatt; gen eddays=educatt*180; /* education in days */ reg hearn eddays; reg lnhearn educatt; /* log level model */ gen lnhearnc=log(hearnc); reg lnhearnc educatt; /* will educatt coefficient change? */ gen lneduc=log(educatt); reg lnhearn lneduc; /* log log model */ *********************************************************** Here I generate age squared, which will make it possible for me to test whether age has a non-linear effect on log hourly earnings. ***********************************************************; gen age2=age*age; reg lnhearn educatt age age2; *********************************************************** Here I do partial regression in order to demonstrate how variation is used in a multivariate OLS setting. PREDICT with the RESID option creates a variable giving the residual values. In the first PREDICT statement, this variable is named EHATE. ***********************************************************; reg lnhearn age age2; predict ehathe,resid; reg educatt age age2 if lnhearn~=.; predict ehated,resid; reg ehathe ehated, noc; reg ehathe ehated age age2; reg lnhearn educatt age age2; *********************************************************** Here I perform a variety of tests. ***********************************************************; test educatt; /* a simple t-test */ test age age2; /* testing the joint significance of the age coefficients */ test age=0; test age2=0,a; test educatt age age2; /* testing overall significance */ gen lnh_ed=lnhearn-0.1*educatt; reg lnh_ed; /* a test with three restrictions */ reg lnhearn educatt age age2; test educatt=0.1; test age age2, ac; *********************************************************** Here I test whether weekly earnings increase with age at various age levels: 25, 40, and 55. Note that with a quadratic age term, the marginal effect of age is given by b[age] + 2*age2*b[age2]. ***********************************************************; gen d25=_b[age]+2*25*_b[age2]; sum d25; test age+50*age2=0; gen d40=_b[age]+2*40*_b[age2]; sum d40; test age+80*age2=0; gen d55=_b[age]+2*55*_b[age2]; sum d55; test age+110*age2=0;