So in the last post (Excel Data Analysis 5.0) we looked at deriving Robust Realized Variance (RRV), today let's look at if we could use RRV to estimate a bias for future returns on the S&P500 off regression based interpolations.

So moving on with the S&P500 RRV worksheet, next to the 60day (business day) realized variance, we will create a column for the average future 60day log return of the S&P500, "dX.f60",

Here we can see that I took the average of the following 60days of S&P500 log returns. So naturally this column must end 60days prior to the date of our last available data.

Next we highlight the columns "RVar(60)", "dX.f60" down to the last value in "dX.f60" and draw a scatter plot. On the plot, we then add a trendline with the highest R-squared (coefficient of determination) value.

The Polynomial Regression had the highest R-squared at 0.0307, it is not statistically significant enough to mean anything. However that doesn't mean this is the end of the road. We can see that the bulk of average daily returns have been increasingly negative about a quarter (60business days) following high realized variance. So what if we analyze only periods of high realized variance?

Here a new column is created for RVV above a certain threshold. The column is named "RVV+", and I chose cell G9 to place the threshold. Next to "RVV+", whenever the realized variance is below the threshold, this column gives a 0. Another new column is created for the following average 60day S&P500 daily returns, "dX.f60+" (matching "RVV+" values). The column "dX.f60+" gives a NA() whenever "RVV+" = 0.

I picked a random point in the realized variance range, 800 as a threshold for this example. Now, let's apply regression between "RVV+" and "dX.f60+",

It looks like a much better fit from the get go. The linear regression gave an R-squared of 0.7373, which is light years better than the earlier regression involving lower realized variance. This crude model could be extended, refined to estimate future index returns or develop risk management tactics, when the realized variance is relatively high.

So moving on with the S&P500 RRV worksheet, next to the 60day (business day) realized variance, we will create a column for the average future 60day log return of the S&P500, "dX.f60",

Here we can see that I took the average of the following 60days of S&P500 log returns. So naturally this column must end 60days prior to the date of our last available data.

**Realized variance vs. future average returns regression fit**Next we highlight the columns "RVar(60)", "dX.f60" down to the last value in "dX.f60" and draw a scatter plot. On the plot, we then add a trendline with the highest R-squared (coefficient of determination) value.

The Polynomial Regression had the highest R-squared at 0.0307, it is not statistically significant enough to mean anything. However that doesn't mean this is the end of the road. We can see that the bulk of average daily returns have been increasingly negative about a quarter (60business days) following high realized variance. So what if we analyze only periods of high realized variance?

Here a new column is created for RVV above a certain threshold. The column is named "RVV+", and I chose cell G9 to place the threshold. Next to "RVV+", whenever the realized variance is below the threshold, this column gives a 0. Another new column is created for the following average 60day S&P500 daily returns, "dX.f60+" (matching "RVV+" values). The column "dX.f60+" gives a NA() whenever "RVV+" = 0.

I picked a random point in the realized variance range, 800 as a threshold for this example. Now, let's apply regression between "RVV+" and "dX.f60+",

It looks like a much better fit from the get go. The linear regression gave an R-squared of 0.7373, which is light years better than the earlier regression involving lower realized variance. This crude model could be extended, refined to estimate future index returns or develop risk management tactics, when the realized variance is relatively high.

## 0 Reflections:

Post a Comment