Times Series Forecasting with Python using Prophet
Time series forecasting is an important analytical method to master in your machine learning toolkit. We would all like to accurately predict the future and for a good reason. Better decisions or actions can be made by knowing what will happen in the future.
One of the best companies at doing so is Facebook, and they open-sourced the procedure used to power many of their forecasting algorithms. In this article, you will learn how to forecast a time series using the rightly named Prophet package.
What is a Time Series?
A time series is a series of points listed in time order. They are a sequence of points taken at the same interval through time.
A few examples of commonly known time series are the monthly inflation numbers reported by the Bureau of Labor Statistics. The daily sales revenue at your company is another example. In finance, the daily closing price for stocks is another.
Imagine if you could correctly predict the future prices for stocks using a forecasting algorithm, you would indeed be rich! Time series forecasting is an important concept that can yield much value to your or your organization.
Everyone knows Facebook. Almost a trillion dollar company (by market cap) inundated with data being generated by its users and by its systems. In the Facebook business, time series forecasting is crucial and they are very good at performing this machine learning analysis.
Prophet is Facebook’s procedure for forecasting time series. It is based on an additive model and open-sourced by Facebooks’ Data Science team and is the algorithm that powers many of Facebook internal use cases for time series forecasting.
As valuable as it is, Time Series Forecasting is not easy and the algorithms are constantly being refined an improved. Have you heard of the M-Competition? It is the competition where the best forecasters show their skills and test their state of the art algorithms sponsored by many companies including Uber, Amazon and Google.
M4 consists of 100,000 time series of Yearly, Quarterly, Monthly and Other (Weekly, Daily and Hourly) data.
To keep up to speed at what the best in the industry are doing, I suggest you keep up with this competition.
Alternatively, we can use the power of the Python Pandas package to easily load this dataset into memory straight from a URL. Note, this works in the latest version of pandas (0.19.2) and forward.
import pandas as pd url="https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=CPIAUCSL&scale=left&cosd=1947-01-01&coed=2019-03-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2019-04-28&revision_date=2019-04-28&nd=1947-01-01" dfData=pd.read_csv(url)
Notice the dataset consists of a date (monthly) and an index. We are interested in predicting the monthly inflation rate as a percentage.
This percentage change is calculated by the percentage delta between 2 consecutive months. Follow the below code to calculate this in pandas.
dfData["PercentageChange"] = (dfData['CPIAUCSL']-dfData['CPIAUCSL'].shift(+1))/dfData['CPIAUCSL'] #Rename Columns to Prophet requirements df = dfData[['DATE','PercentageChange']] df.columns = ['ds','y'] df.tail()
Prophet requires the time column to be named ds and the value (what we will be forecasting) to be named y.
The Pandas dataframe ds contains the data we need to perform time series forecasting using prophet.
It is always a good idea to visually inspect the dataset you are aiming to forecast on. Run the following code to plot the time series we will be forecasting.
import matplotlib.pyplot as plt df.plot(figsize=(10,5)) plt.title("USA Monthly Inflation Rate") plt.show()
Time Series Forecasting with Prophet
In this article, we will aim to forecast the last 12 months of the current dataset and then compare against the actuals.
To do so, let’s remove the last 12 months available from the ds dataset using the pandas tail function this is a breeze.
#Get DataSet to Forecast future = df.tail(12) future = future.drop('y', axis=1) #drop last 12 months of actuals df.drop(df.tail(12).index,inplace=True)
Our future dataset contains the following months, which we will be forecasting. These have been removed from our df dataframe which will be used for training.
Train a Model with Prophet
According to Facebook, Prophet is able to give accurate forecasting results with its default settings.
It is also able to identify if a time series is daily or weekly. In our case, it is monthly so we will help the model by specifying the seasonality.
First, import the Facebook Prophet module.
from fbprophet import Prophet
Initialize a new Prophet class. We disable the weekly seasonality because our time series is monthly.
Then, using the add_seasonality method, we add a monthly seasonality to the model. You can specify other seasonalities to the model such as Quarterly or Hourly by changing the fourier_order.
The last line will carry out the actual model fitting.
#Fit a time series forecasting model m = Prophet(weekly_seasonality=False) m.add_seasonality(name='monthly', period=12, fourier_order=5) m.fit(df)
Forecast with Prophet
With the model fitted, let’s generate a forecast for the 12 months we backed out of the actuals dataset.
Use the predict method to generate the forecast. The input to this method is a series of time steps with the column named ds.
The future dataframe we previously generated already has the required form.
forecast = m.predict(future) #Plot the Forecast m.plot(forecast) plt.title("Time Series Forecasting with Prophet") plt.show()
You can see in the graph above the forecast together with the confidence interval range generated by Prophet.
Prophet is based on an additive model, which means it models a Trend and Seasonality among other components.
Using the plot_components method, you can take a look at the model’s components. Notice a yearly chart being shown. This is because we did not disable the yearly seasonality component of Prophet.
The Prophet forecast dataframe contains much important information that you can use and tweak in your models.
Aside from the forecasted value yhat, it offers many other values you can use such as:
ds, trend, trend_lower, trend_upper, yhat_lower, yhat_upper, additive_terms, additive_terms_lower, additive_terms_upper, monthly, monthly_lower, monthly_upper, multiplicative_terms, multiplicative_terms_lower, multiplicative_terms_upper, yearly, yearly_lower, yearly_upper, yhat
To view them, just select your desired ones using regular pandas dataframe functionality. Let’s print the upper and lower confidence intervals as well as yhat for each of the predicted months.
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
How good is our Model?
Let’s build a quick chart displaying both the forecasted values and the actual values for the predicted months.
dfForecast = forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']] yActual = dfData.tail(12)['PercentageChange'].values.tolist() yPredicted = forecast['yhat'].values.tolist() plt.plot(dfForecast.ds,yActual,dfForecast.ds,yPredicted) plt.title("Forecasted Value vs Actuals") plt.show()
Prophet did a really good job (if you ask me) with the forecast using the default values.
What do you think?
Time series forecasting has a myriad of use cases throughout multiple different industries. It is a must-have tool in your data science toolkit.
Thanks to Facebook, we can truly be a Prophet. The open-source Prophet module is a powerful and flexible tool that can be easily applied to various time series forecasting use cases.