Hints on using Microsoft Excel.

IMPORTANT- If the Tools pull down menu does not have Data Analysis as one of the options, click Add-ins and check the boxes for Analysis Toolpak and Analysis Toolpak-VBA. The Data Analysis option should now appear.

Basic Excel Spreadsheet tools

1. Creating formulas. Highlight a cell and press the "=" key. Whatever follows the "=" key Excel treats as a formula. For example, suppose you wish to standardize the value 50 using the formula (value - mean)/st. dev, where the mean = 75 and st. dev = 20. In the cell where you would like the result to appear, type: =(50-75)/20
Hit enter and the value -1.25 will appear in the cell. In excel, "*" represents multiplication and "^" indicates exponentiation. Note also that Excel doesn't assume multiplication unless explicitly directed. Thus, 20*(14-3) will not work as 20(14-3), as it will on most graphics calculators.

2. Cell references. The real power of the spreadsheet is that cell references can be included in formulas. For example, suppose a list of values is in the cells A1:A9, and we wish to standardize each of these values and put the results in the cells B1:B9. The formula =(A1-75)/20 can be entered in cell B1, then this cell copied to the cells B2:B9. By default Excel changes the cell reference each time, so that it computes the formula =(A2-75)/20 in cell B2, etc. for all the cells. (An easy shortcut for copying is to highlight a cell, grab the lower right corner of the cell, and then drag the cursor until the target cells are highlighted, then release.)
 

Summary statistics

1. Tools>Data Analysis>Descriptive statistics will automatically generate a set of summary statistics, but this set may be incomplete (no quartiles, for example).

2. Various functions can be used to compute mean, median, quartiles, standard deviation, etc (just about anything you can think of). These must be computed one at a time. Use the fx button to locate functions. A few examples:

AVERAGE
MAX
MEDIAN
MIN
QUARTILE
STDEV
STDEVP
VAR
VARP
 
 

Creating graphical displays

1. Histograms. Choose Data Analysis>Histogram and enter the appropriate ranges for the data (and the bin, if you have created one.) Be sure to check the box at the bottom that says "Chart Output", since curiously the Histogram procedure will not produce the histogram by default, but only the frequency table. The procedure will create what it believes are the most appropriate ranges for the groups if the "bin" range is left blank. If you wish to specify those ranges, create a column containing the upper endpoints of the class intervals to be used to group the data (this is called the "bin"). For instance, if you wish to group the data into the groups 11-20, 21-30 and 31-40, then the bin column would contain the values 20, 30 and 40. Enter the range of these values as the bin range.
Note: By default, the bars will be drawn with "gaps," which are not appropriate for histograms. To "close the gaps," after the chart is produced right click on one of the bars and choose Format data series then then Options tab. Set the "gap width" to zero.

2. Boxplots. Excel will not voluntarily produce a boxplot. According to Microsoft, one way to get it to produce one is described here (although I've never actually ssen it done): http://support.microsoft.com/support/kb/articles/q155/1/30.asp

3. Bar and pie charts. Use the Chart Wizard (Insert>Chart) and choose the desired chart. The data must already be totaled for each category.

4. Graphing functions.
 

Linear regression and correlation

1. Scatterplots. Use the Chart Wizard. Select XY (Scatter). Specify the data range (all data) and be sure to tell whether the data is in rows or columns (careful, the default is rows!) The scatterplot will be displayed as you go, so you should be able to tell if it is being graphed correctly.) While at step 2, click the Series tab and remove any series that you don't wish to appear. Step 3 allows you to label the axes and also give the chart a title. Step 4 allows you to create the graph on a new sheet of the existing sheet. You're done!

2. Linear Regression. Use the Tools menu and choose Data Analysis. Choose Regression. Specify the ranges for the x (explanatory) and y (response) variables. Check the boxes for Residuals and Residual Plots. Click OK. This will give you r (called Multiple R), r-square, the slope and intercept of the least squares regression equation, predicted and residual values for each observation, and a residual plot.
 
 

Sampling and generating random numbers

1. Generating a SRS. When planning a SRS it is necessary to generate a series of random numbers which will determine which elements are sampled from the population list. Suppose we want to randomly select a page of the WKU student directory (pp. 75 - 171). Choose Tools>Data Analysis>Random number generation. In the dialog box request 1 variable, "n" random numbers, Uniform distribution and supply the range between: 75 and 171. Choosing Uniform distribution ensures that each of the (real) numbers between 75 and 171 will be equally likely to be generated. This will generate a column of n random numbers between 75 and 171.

2. Choosing a random sample from a population of values already entered into a column of the worksheet. Choose Tools>Data Analysis>Sampling. Input the range containing the population values, choose the Random Sampling Method and under Number of samples enter n, the sample size. This will generate a random sample of size n from the population. Repeat as necessary.

3. Generating a random sample from a specified probability distribution. Choose Tools>Data Analysis>Random number generation. "Number of variables" specifies the number of columns that will be filled, and "Number of random numbers" is the number of values in each column. Choose the "Distribution" and the associated "Parameters". If you wish to sample from a discrete distribution which is not listed, you can enter the probability distribution (in tabular form) on the worksheet and then choose Discrete as the "Distribution" and in the "Parameters" section give the range containing the distribution table.
(To generate exponential distribution values using Excel, recall that the exponential distribution is a special case of the gamma distribution. Excel does have the following function:
    GAMMAINV(probability,alpha,beta): Returns the inverse of the gamma cumulative distribution. Probability is the probability associated with the gamma distribution. Alpha is a parameter to the distribution. Beta (called theta in our text) is a parameter to the distribution.
Randomly generate a list of probabilities using Uniform [0,1] random numbers, then evaluate each of these using the GAMMAINV function specifying "alpha" to be the mean and "beta" equal to 1.
 

Computing probabilities for random variables

There are functions for computing both the probability density (or mass) function value and the cumulative distribution function (CDF) value for many probability distributions. Simply fill a column with values of the random variable for which probabilities are to be calculated. Then hit the function fx key. Choose the appropriate function and follow the directions for filling in the required fields. Be sure to give the cell location for the value of the random variable when creating the function. Then once the function is created for one cell, you can drag/copy that cell to the remainder of cells in that column and the location of the adjacent random variable value will automatically be inserted. The functions available are:
BETADIST Returns the cumulative beta probability density function
BINOMDIST Returns the individual term binomial distribution probability
CHIDIST Returns the one-tailed probability of the chi-squared distribution
EXPONDIST Returns the exponential distribution
FDIST Returns the F probability distribution
GAMMADIST Returns the gamma distribution
HYPGEOMDIST Returns the hypergeometric distribution
LOGNORMDIST Returns the cumulative lognormal distribution
NEGBINOMDIST Returns the negative binomial distribution
NORMDIST Returns the normal cumulative distribution
NORMSDIST Returns the standard normal cumulative distribution
POISSON Returns the Poisson distribution
TDIST Returns the Student's t-distribution
WEIBULL Returns the Weibull distribution

For example, to create a table of hypergeometric probabilities when sampling n=10 from a set of N=50 objects containing 20 "successes" and 30 "failures", first enter the possible x values, 0,1,...,10 in cells A1,A2,...,A11. Then highlight cell B1 and hit the fx button. Look under Statistical functions and choose HYPGEOMDIST. Enter A1 for Sample_s, 10 for Number_sample, 20 for Population_s and 50 for Number_pop. Click OK, and the probability for x = 0 will appear in B1. Now position the cursor in the lower right hand corner of cell B1 until a solid black cross appears, then click, hold, and drag down to B11. When you let go, the rest of the probabilities will appear.
 

Finding quantiles of probability distributions

These functions will return a quantile of a distribution, when given a probability and other necessary information. They are found by clicking the fx button. A few useful functions:

NORMINV: Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation.
NORMSINV worksheet function
TINV worksheet function
GAMMAINV(probability,alpha,beta): Returns the inverse of the gamma cumulative distribution. Probability is the probability associated with the gamma distribution.

Confidence intervals and p-values

1. Margins of error.
a) Estimating the mean of a population using a single SRS.
1) Using the normal distribution. The CONFIDENCE function computes the margin of error associated with estimating the mean of a population, assuming the distribution of the sample mean is normal and the population standard deviation is known, i.e., using the formula z*(sigma/sqrt(n)). You must supply alpha = 1 - (confidence level), sigma, and n.
2) Using the t distribution. There is no function which will compute the margin of error using the formula t*(s/sqrt(n)). However, using Data Analysis<Descriptive Statistics check the box Confidence interval for the mean and specify the confidence level. This will return the margin of error for the confidence interval, using the t distribution. In fact, for most practical applications for which sigma is unknown (i.e., in situations such as your project analysis), this is the preferred method.
b) Estimating the proportion of a population using a single SRS. Use the CONFIDENCE function with sigma = sqrt[p(1-p)].

2. P-values.
a) Testing the mean or proportion of a population using a single SRS.
1) Using the Normal distribution. The function NORMSDIST will return the area to the left of any z value input(exactly like Table A in the text). This value can be used to determine any p-value, depending on the alternative hypothesis.
2) Using the t distribution. The TDIST will return the area to the right of any t value (called "x" in the dialog box, for some strange reason), for given degrees of freedom. You can specify #tails to designate either a one or two tailed p-value (specifying two tails will double the one-tailed value)