Multiple Linear Regression in Excel with Applications in Multifactor Modeling
With the success of Microsoft Excel in the mid-80′s, came the ability to perform multiple linear regression (MLR) within Excel (also known as curvi-linear regression). The regression can be performed in three way: (1) using the LINEST function, (2) using the Regression option after adding-in the Analysis Toolpack add-in, or (3) via Solver after adding-in the Solver add-in.
One potential application for utilizing MLR is to construct a multifactor model of the stock market. In this post I will perform the three types of analysis described above to examine three potential macro-economic factors (i.e., Real GDP, Inflation via the CPI Index, and the 90-Day T-Bill Rate) against the S&P 500 (a proxy for the US stock market) to see if any of these factors influence the stock market, and should thus be included in a multifactor model.
The following attached model should be referenced from this point on: MLR Analysis
Data for the stock market was obtained from Yahoo Finance, and data for the three factors was obtained from The Federal Reserve of St. Louis’ FRED (Federal Reserve Economic Data), with the raw data each being presented in their own Worksheets of the Workbook. All of the raw data is then compiled in to columns B through F of the Compiled Data Worksheet, and is then converted to annual YoY percentages in Columns G through J.
The reader is left to discover how to use the LINEST function on their own, but it is a simple process outlined clearly on Microsoft’s website, or numerous other websites on the web.
The LINEST Output that can be found on the Compiled Data Worksheet of the attached Workbook is pictured below:
Cells G37:J41 are the LINEST fit for the three variables against the S&P 500. The orange rows are the coefficients and their standard error – a ratio of these in row 43 yields their T-Statistics. The larger value of 4.12 (cell I43) for GDP looks the most promising, and this can also be seen by the coefficient itself having the highest activity of 5.62 (cell I 37).
Excel performs LINEST at 95% probability, so α (alpha) for the T-Statistic and F-Statistic is at 0.05. If you desire statistics with a different probability, then use the Regression program via the Analysis Toolpack where you can vary the α. For this regression, Cell G49 gives the F-Distribution’s percentage points based upon the V1 and V2 degrees of freedom. Since the F Statistic in Cell G40 is greater than this, then the regression is acceptable, as can also be seen by the inverse of the right-tailed F Probability in Cell G48 being 0.7%. This means that there is a 0.7% chance of finding a better data set (i.e., a dataset with a F Statistic cell G40 being greater than 5.96). Likewise, the negative of the left-tailed inverse of the Student’s t-distribution in Cell M39 yields a T-Statistic of 1.70 for an α=0.05, and in looking at the T-Statistics for the coefficients in Row 43, you can clearly see that only the GDP is acceptable. A R2 of 0.39 in Cell G39 is acceptable, but not stellar – a R2 above +0.6 or below -0.6 is ideal, and the closer to 1 or -1 the better.
Columns K through O are a manual analysis of the data, replicating the major statistics generated by LINEST (see Cells M37:N38). If you plan to use Solver to optimize a set of data as described in the opening paragraph of this post, then you will need these statistics to do so.
Columns R:S, U:V, and X:Y are individual LINEST regressions of the three combinations: S&P 500 vs GDP, CPI and 90-Day T-Bill respectively. Now that you are getting familiar with the statistics, you can quickly see that only the GDP holds statistical credibility based upon both the F and T statistics.
To do this analysis correctly, these same factors should be fit against numerous other proxies for the market, such as the Russel 3000, Wilshire 5000, etc. to verify that similar F and T statistics s are obtained. If the sign and approximate T Statistic is maintained for GDP, then this is definitely a factor to include in your macroeconomic multifactor model of the market.
Once you have identified 4-6 factors that significantly impact the market 9and modeled these against the market), then you can fit the factors against your own portfolio to obtain a predictive model for your portfolio. By fitting individual stocks to these same factors, you can maybe identify undervalued stocks, and predict their impact on your portfolio – these are all potential future blog posts. Factors that have been shown to be material in research papers on an individual stock basis include E/P (Earnings over Price), M/B (Market value over book value), and β (Beta), to only mention a few of the more significant factors.
Install the Analysis Toolpack add-in, then select Data Analysis under your data menu; once launched select Regression from the data analysis options, and setup your regression as follows:
This will create a new Workbook tab identical to the Regression Worksheet in the attached Workbook. And if you compare the curve fit and statistics supplied by the regression package, it matches what we obtained and/or derived from the LINEST function.
Solver can be used to converge on the optimum coefficient values to reduce the standard error for the y-estimate while maintaining an R-Squared (the correlation coefficient) between -1 and 1. Refer to the Solver Worksheet in the attached Workbook. First install the Solver add-in. Once installed, launch solver from your data menu. You should see the following dialog box with Solver already set-up for optimizing the coefficients as per the parameters input below.
You will see that the exact coefficients are derived, and the manually calculated statistics match those from LINEST as well as from the Regression, but we do miss out on several free statistics such as the standard error in the coefficients and constant, making it difficult to calculate the T-Statistics. A future post will address the statistics for deriving these, so these could be built into the Solver solution manually if desired.
Future posts will further explore the use of LINEST to solve for non-linear equations containing Log10(x), LN(X), EXP(X), and polynomials, including the fit of contours/surfaces. A future post will also address the use of MLR in the application of the Arbitrage Pricing Theory (APT).
©2011 Ben Etzkorn