Download 31.45 Kb.

AP Statistics  Linear Regression & Minitab: Interpreting Important Variables Please do all work in your notebook as there is not enough room on the handout. I. Minitab / Computer Printouts Below is a computer output. We will discuss which numbers you need to know, what they mean, and how to interpret them so follow along closely. SE Coef, = 0.3839 represents the standard deviation of the slope S = 13.40 represents standard deviation of residuals Constant  87.12 This is the yintercept. This is the value of the response variable when the explanatory variable is 0. Check the context of the situation. Often, there can be no such value. In this case, it is not possible to have a volume that is negative nor is it possible to have a height of zero. Height/Slope  1.5433 This is the coefficient of the explanatory variable, thus it is the slope. This entire line of numbers deals with regression for slope. For each increase in height of one unit, the volume is expected to increase by approximately 1.5433 units. (Actual units were not provided) Prediction equation (ie. Least Squares Regression Line) This is an equation used to make predictions and is based on only one sample. SE Coef  0.3839 This is the standard deviation of the slope. Remember, this data came from only one sample. We would expect the slope to vary a little from sample to sample. Thus, If we gathered repeated samples, we would expect the slope of the volumes of the trees to vary by approximately 0.3839 units. S  13.40 This is the standard deviation of the residuals. The average amount that the observed values differ from the predicted values is 13.40. The average amount that the observed volumes of trees differ from the predicted volumes is approximately 13.40 units.  35.8% This is the correlation of determination, which is the fraction or proportion of variation in the y values that is explained by the least squares regression of y on x. About 35.8% in the variation in volume can be explained by the least squares regression of y( volume) on x ( height). This is the correlation coefficient. It tells you strength and direction of the relationship. With an value of .598, there is a weak, positive relationship between height and volume of trees. T  4.02 This is the test statistic which = P 0 This is the pvalue of a Linear Regression t test. With a pvalue of approx.. 0 less than any alpha level (.05, .01), reject the null. There is evidence that there is a relationship between the volume of a tree and its height. Example 2: Minitab / Computer Printouts Regression Analysis: Height versus Mother Height The regression equation is the estimated regression Height = 24.7 + 0.640 Mother Height equation: y = b_{0} + b_{1}x (dependent (intercept, (slope, (independent variable, y) b_{0 }) b_{1}) variable, x) (test (estimates) (sd of ests.) statistics) (pvalues) Predictor Coef SE Coef T P Constant 24.690 8.978 2.75 0.009 IGNORE these values Mother H 0.6405 0.1394 4.59 0.000 tests H_{0}: β_{1} = 0, vs. twotailed altern. intercept, b_{0} (the latter is equivalent to testing for slope, b_{1} linear correlation between x and y) S = 2.973 RSq = 35.7% RSq(adj) = 34.0% (standard error (coefficient of linear (adjusted r^{2}, used for multiple regression) of estimate, s_{e}) determination, r^{2}) * (the coefficient of linear correlation is the square root of r^{2}, with the same sign as the slope, b_{1}) for a simple linear regression minitab models only conduct twotailed alternatives S – 2.973 standard error of the estimate, or the standard deviation of the residuals  actual values versus the predicted values SE Coef – 0.1394 – standard deviation of the slope Practice Minitab / Computer Printouts 1. A sample of men agreed to participate in a study to determine the relationship between several variables including height, weight, waste size, and percent body fat. A scatterplot with percent body fat on the yaxis and waist size (in inches) on the horizontal axis revealed a positive linear association between these variables. Computer output for the regression analysis is given below: Dependent variable is: %BF Rsquared = 67.8% S = 4.713 with 2502 = 248 degrees of freedom Variable Coefficient se of coeff tratio prob Constant 42.734 2.717 15.7 <.0001 Waist 1.70 0.0743 22.9 <.0001 (a) Write the equation of the regression line: (b) Explain/interpret the information provided by Rsquared in the context of this problem. Be specific. (c) One of the men who participated in the study had waist size 35 inches and 10% body fat. Calculate the residual associated with the point for this individual. Determine the LSRL, Standard deviation for slope, correlation coefficient and the standard error of the residuals for each: #2. #3. II. More Practice with Linear Regression and Residual Plots 4 . Fast food is often considered unhealthy because much fast food is high in fat and calories. The fat and calorie content for a sample of 5 fastfood burgers is provided below. Fat(g) Calories 31 580 35 590 39 640 39 680 43 660
Stat > calc > 8:linreg(a+bx) >L1, L2, Y To get the Y to show up: Vars > Y Vars> 1:Function > 1: Y1 This will graph the LSRL along with your scatterplot. If you go to the Y= screen, you will now see the equation for the LSRL
5. Becky’s parents have kept records of her height since she was born. The data set consists of Becky’s age in months and her height in centimeters. The summary statistics for the data are provided below: Mean age: 44 months std. dev. age: 8.5 months Mean ht: 82 cm std. dev. ht: 4.1 cm The correlation between age and height is .86. (a) Find the equation of the least squares line that you would use to predict Becky’s height from her age. Show all work. (b) What realworld information does the slope provide? Be specific! (c) Suppose height had been measured in inches rather than in centimeters. What would be the correlation between age and height in inches? Note: 1 inch = 2.54 cm 