Forecast excess returns on a monthly basis. According to Welch and Goyal (2008), numerous economic variables with in-sample predictive ability for the equity premium fail to deliver consistent out-of-sample forecasting gains relative to the historical average. They argue that model uncertainty and instability seriously impair the forecasting ability of individual predictive regression models. A copy of their paper [“A Comprehensive Look at The Empirical Performance of Equity Premium Prediction”


The core data structure in Spark is a resilient distributed data set (RDD). As the name suggests, an RDD is Spark's representation of a data set that's distributed across the RAM, or memory, of a cluster of many machines. An RDD object is essentially a collection of elements we can use to hold lists of tuples, dictionaries, lists, etc. Similar to a pandas DataFrame, we can load a data set into an RDD, and then run any of the methods accesible to that object.