/* COMPUTER LAB # 5 ECO 648, UNCG, Prof. Bearse In this lab, we learn to specify, diagnose, and forecast a univariate time series with an ARMA model. DATA: c.txt (on 648 homepage) PROCEDURES: infosub.src infosat.src (also on 648 homepage) Save the data to the same directory as your lab5.rtw file. Save the procedures to your WINRATS directory. */ SOURCE(NOECHO) N:\ECO\PMBEARSE\WINRATS\infosat.src SOURCE(NOECHO) N:\ECO\PMBEARSE\WINRATS\infosub.src * * NOTE: You must point RATS to the * directory where YOU saved * the procedures. * By default, RATS looks to * directory from which the program * is run. * * CALENDAR 1959 1 12 * The format of this statement is * CALENDAR 'year data starts' 'period data starts' 'Data Span' * where Data Span = 12 for monthly data * 4 for quarterly data * 1 for annual data ALLOCATE 1991:1 * The format of this statement is * ALLOCATE 'date the data ends' * e.g., if monthly data that ends in Jan. of 1999, * we use * ALLOCATE 1999:1 * if quarterly data that ends in the 1st quarter * of 1999, we also use * ALLOCATE 1999:1 * OPEN DATA c.txt DATA(FORMAT=free,ORG=obs) / dates rawc * Monthly data on real personal consumption exp. * from 1959:1 through 1999:1. (Measured in * billions of Chained 1992 dollars. Seasonally * adjusted.) graph 1 # rawc end SET c = LOG(rawc(t)) - LOG(rawc(t-1)) * We will work with the first differences of * the natural logarithm of consumption (i.e., * approximate consumption growth) * COMPUTE start = 1959:2 * This statement defines the date when our * historical data begins. Notice that this * is 1959:2 instead of 1959:1 b/c we lost * an observation when first differencing. * COMPUTE end = 1991:1 * This statement defines the date when our * historical data ends * COMPUTE span = 12 * This statement defines the frequency of our data. * where span = 12 for monthly data * 4 for quarterly data * 1 for annual data * COMPUTE numlags = 50 * This is the number of SACF and SPACF lags you want to * compute. It also impacts the number of Q stats computed. * COMPUTE horizon = 48 * This is the number of periods ahead we will forecast (in * jargon, the "forecast horizon") * SET y = c * PRINT / y OPEN PLOT g1.rgf GRAPH(header='Real Personal Consumption Growth') 1 # y CORRELATE(PRINT,NUMBER=numlags,STDERRS=sesacf,$ PARTIAL=spacf) y / sacf * print(NUMBER=0) / sacf sesacf * print(NUMBER=0) / spacf SET sigsacf 1 numlags+1 = abs(sacf/sesacf)>1.96 SET sigspacf 1 numlags+1 = abs(sqrt(%NOBS)*spacf)>1.96 OPEN PLOT g2.rgf GRAPH(HEADER='SACF',STYLE=BARGRAPH,NUMBER=1,MAX=1.0,$ MIN=-1.0,SHADING=sigsacf) 1 # sacf 2 numlags+1 * * The SACF is significant at lags 1, 20, and 24. * If we viewed the significant results at lags * 20 and 24 as due to chance (i.e., Type I errors), * then we might conclude that an MA(1) model is * appropriate. On the other hand, if we take the * SACF plot literally, we might think that * a subset MA{1,20,24} is a good candidate. * The fact that we are dealing with monthly data * means we should not dismiss significance at lag 24 * so quickly. This is because the seasonal span here * is 12 which implies that lag 24 is just the second * seasonal lag. * open plot g3.rgf GRAPH(HEADER='SPACF',STYLE=BARGRAPH,NUMBER=1,MAX=1.0,$ MIN=-1.0,SHADING=sigspacf) 1 # spacf 2 numlags+1 * * The SPACF is significant at lags 1, 7, 20, 24, and 36. * It is possible that we are witnessing some sort of a decay * process here. If that is the case, then a pure MA * model might be appropriate. * On the other hand, if we take this picture literally, * we might think that a subset * AR{1,7,20,24,36} * model is appropriate. * Again, because we have monthly data, we should not be * quick in assuming away the significance of lags 24 and 36. * * A further point to remember is that the only time BOTH * the ACF and PACF of an ARMA process can cut off at a finite * lag is when there is white noise. If there is an AR component, * the ACF will decay. If there is an MA component, the PACF * will decay. * * Indeed, it is possible that we are witnessing decay in both * the ACF and PACF after lag 1. This would indicate an ARMA(1,1). * * * Since there is no really clear ARMA model being suggested by * the SACF and SPACF, examining some model selection criteria * may prove quite valuable here. * * Quick Primer on AIC and SBC * AIC and SBC can be used to determine the which predictors * to include in a model. * AIC stands for Akaike's Information Criterion. * SBC stands for Schwarz Bayesian Criterion. * They are computed as * AIC = NOBS*LOG(SSE) + 2*NUMPARMS * SBC = NOBS*LOG(SSE) + LOG(NOBS)*NUMPARMS * where * NOBS = the number of usable observations * SSE = Sum of Squared Errors from the fitted model * NUMPARMS = number of coefficients estimated in the fitted * model * * According to the AIC, the best ARMA model is the one with the * smallest AIC value. Similarly, according to SBC, the best model * is the one with the smallest SBC score. AIC scores are NOT * comparable with SBC scores. * * When comparing models with AIC or SBC, estimate all models over * the same sample period. This is important since, all else equal, * SSE will be smaller when there is a smaller number of usable * data observations. * * Based on what we saw in the SACF and SPACF, several candidate * models suggested themselves. These might be * MA(1) * AR(1) * Subset AR(||1,7||) * Subset MA(||1,24||) * ARMA(1,1) * Subset ARMA(||1,7,24,36||,||1,24||) * * We will examine each of these possibilities with model * selection criteria. Since the largest AR lag is 36, * when comparing models we estimate each one treating * the first 36 observations as pre-sample. * * Since the mean of our data appears to be clearly * different from zero, we include a constant in all * model evaluations. * COMPUTE maxAR = 36 @INFOSUB(CONSTANT,NOPRINT) y * ||1|| start+maxAR end * MA(1) @INFOSUB(CONSTANT,NOPRINT) y ||1|| * start+maxAR end * AR(1) @INFOSUB(CONSTANT,NOPRINT) y ||1,7|| * start+maxAR end * Subset AR(||1,7||) @INFOSUB(CONSTANT,NOPRINT) y * ||1,24|| start+maxAR end * Subset MA(||1,24||) @INFOSUB(CONSTANT,NOPRINT) y ||1|| ||1|| start+maxAR end * ARMA(1,1) @INFOSUB(CONSTANT,NOPRINT) y ||1,7,24,36|| ||1,24|| $ start+maxAR end * Subset ARMA(||1,7,24,36||,||1,24||) * Both AIC and SBC agree that Subset MA(||1,24||} * is best. Let's examine the coefficient estimates * and diagnose the residuals for this model. * * Estimate the Subset MA(||1,24||) model to obtain a fitted version boxjenk(CONSTANT,MA=||1,24||,ITERATIONS=100,DEFINE=armaeq) $ y start+24 end resids * * All the estimated coefficients are of "high * quality" --- i.e., large t-values (low p-values). * This is desirable in a forecasting model. * * * Diagnose the residuals from the fitted Model * * Plot ARMA residuals to see if they appear random open plot g4.rgf graph(header='ARMA Residuals') 1 # resids * * Examination of the plotted residuals indicates * no obvious pattern. This is of course what * we were hoping for. If there is a pattern in * the residuals, then this could indicate we are * missing something systematic in the data. * * Check whether the residuals from the fitted ARMA * model appear to be white noise using the SACF CORRELATE(PRINT,NUMBER=numlags,STDERRS=aerrs, $ QSTATS,SPAN=span) resids / acorrs set signif 1 numlags+1 = ABS(acorrs/aerrs)>1.96 * print(number=0) 1 numlags+1 signif open plot g5.rgf graph(header='SACF of ARMA Residuals',$ style=bargraph,number=1,max=1.0,min=-1.0,$ shading=signif) 1 # acorrs 2 numlags+1 * * We want the SACF of the residuals to mimic * white noise. Recall that all autocorrelations * should be zero for a white noise process. * In our case, we see two significant autocorrelations: * one at lag 8 and one at lag 20. Consequently, * we might want to try fitting some more lags * and seeing if this lowers AIC and or SBC. For * instance, we might want to try MA lags at * 8 and 20 in addition to 1 and 24. * * Examine the Ljung-Box Q stats for the Residuals * * By including the options QSTATS and SPAN=4 in the * CORRELATE statement, we told RATS to compute * Ljung-Box Q stats at intervals of 4 lags. * The Ljung-Box Q stat tests the following: * H0: SACF lags 1 through k are NOT signicantly * different from zero * (i.e., residuals consistent with white noise) * against * H1: At least one of lags 1 through k is * is significantly different from zero. * (i.e., residuals NOT consistent with white * noise) * * If the significance level for a Q stat is less than * 0.05, we conclude that at least one of SACF * lags 1 through k is different from zero at the * 0.05 level of significance. Our output shows us * that Q(12), Q(24), Q(36), and Q(48) * are NOT significant. * This is consistent with the residuals being * white noise. * * As a whole, the residual diagnostics seem fairly * satisfactory. We will proceed to forecast with * the subset MA(||1,24||). * * * Forecasting with the Subset MA(||1,24||) * * This statement defines the number of * steps ahead we will forecast (i.e., the * forecast horizon) SMPL end+1 end+horizon FORECAST(NOPRINT) 1 horizon end+1 # armaeq point * This computed the point forecasts ERRORS 1 # armaeq sef * This computed the standard errors * of the forecast errors set lower = point - 1.96*sef set upper = point + 1.96*sef * These computed the the 95% interval forecast * assuming Normally distributed errors print end+1 end+horizon lower point upper OPEN PLOT g6.rgf graph(header='History, Point Forecast, and Interval Forecast') 4 # y start end 1 # point end+1 end+horizon 2 # lower end+1 end+horizon 3 # upper end+1 end+horizon 3 * * Notice that, because the model is stationary, the * long run forecast settles down to the sample mean * of the data. Indeed, since we have a pure MA * model, the ARMA forecast equals the sample mean * for all steps greater than the largest MA lag. * END * We END the program here b/c the following code * can take quite a while to run (depending on * the sample size and how large maxar,maxma are * set. To run it, comment out the END statement. * * * A low cost (in terms of your labor, not the * computer's!) way to find the best Saturated ARMA * model according to AIC or SBC is to use the * following code. * COMPUTE maxar = 4 COMPUTE maxma = 4 @INFOSAT(NOPRINT,NOREPORT) y maxar maxma start+maxar end @INFOSAT(NOCONSTANT,NOPRINT,NOREPORT) y maxar maxma $ start+maxar end * The above two statements compute AIC and SBC for EVERY * combination of SATURATED ARMA(p,q) up through p=maxar * and q = maxma. The first call returns the best SATURATED * ARMA when a constant is included. The second call returns * the best SATURATED ARMA when NO CONSTANT is included. * (If it is obvious that you data has a nonzero mean, there * is no need to run the no constant version.) * END