MODELING
Because we have demonstrated that there is a relatively strong positive correlation between retail activity and WallStreetBets discussion for many stocks, a natural next step is to train models on the data to get a sense of the predictive power of WSB discussion.
This could be quite useful if done successfully, as the data on retail activity is only available on a daily basis. Because we have live WSB discussion data, we could make predictions on retail activity before the data is released.
For the sake of brevity, we'll fit a simple linear regression model to data for all of the tickers, using retail activity as the dependent variable and WSB discussion as the sole independent variable.
Fitting models on the 120 tickers, we find 102 to be highly statistically significant (P<.001).
In order to attempt to estimate the predictive power, we will examine the R-squared value of these regressions, which tells us what proportion of the variation in retail activity can be explained by WSB discussion.
Here are the stocks whose models have the largest R-squared value: