Prediction Collection#

First, let’s fit a vanilla model:

[1]:

if "google.colab" in str(get_ipython()):
    # uninstall preinstalled packages from Colab to avoid conflicts
    !pip uninstall -y torch notebook notebook_shim tensorflow tensorflow-datasets prophet torchaudio torchdata torchtext torchvision
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level

set_log_level("ERROR")

[2]:

data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)

[2]:

	ds	y
141	1960-10-01	461
142	1960-11-01	390
143	1960-12-01	432

[ ]:

m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

Getting the latest forecast df#

We may get the df of the latest forecast for data analysis.

[4]:

forecast = m.predict(df)

[5]:

df_fc = m.get_latest_forecast(forecast)
df_fc.head(3)

[5]:

	ds	y	yhat1
0	1960-10-01	461.0	463.035004
1	1960-11-01	390.0	410.434906
2	1960-12-01	432.0	439.753998

Number of steps before latests forecast could be included. Here we include 5 steps before latest forecast.

[6]:

df_fc = m.get_latest_forecast(forecast, include_previous_forecasts=5)
df_fc.head(3)

[6]:

	ds	y	yhat6	yhat5	yhat4	yhat3	yhat2	yhat1
0	1960-05-01	472.0	476.862457	None	None	None	None	None
1	1960-06-01	535.0	531.84314	527.112732	None	None	None	None
2	1960-07-01	622.0	579.601501	576.99292	590.679077	None	None	None

Historical data could be included, however be aware that the df could be large.

[7]:

df_fc = m.get_latest_forecast(forecast, include_history_data=True)
df_fc.head(3)

[7]:

	ds	y	yhat1
0	1949-01-01	112.0	None
1	1949-02-01	118.0	None
2	1949-03-01	132.0	None

Collect in-sample predictions#

Predictions sorted based on forecast target#

Calling predict, we get a df_forecast where each 'yhat' refers to the  -step-ahead prediction for this row’s datetime being the target. Here,  refers to the age of the prediction.

e.g. yhat3 is the prediction for this datetime, predicted 3 steps ago, it is “3 steps old”.

Note that the last row 1961-3-01 only has a yhat3, which was forecasted at the last location with data 1960-12-01. Because we lack inputs after that location, we do not have more recent predictions yhat1 from 1961-2-01 nor yhat2 from 1961-1-01.

We also get the individual forecast components, which also refer to their respective contrigution to yhat, forecasted  steps ago.

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

[4]:

df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)

[4]:

	ds	y	yhat1	residual1	yhat2	residual2	yhat3	residual3	ar1	ar2	ar3	trend	season_yearly
141	1960-10-01	461.0	464.689362	3.689362	467.748444	6.748444	474.838562	13.838562	-217.455673	-214.396606	-207.306473	702.886719	-20.741653
142	1960-11-01	390.0	409.214203	19.214203	408.547119	18.547119	417.346649	27.346649	-265.351379	-266.018463	-257.218933	709.864075	-35.298515
143	1960-12-01	432.0	424.255768	-7.744232	441.038513	9.038513	440.375763	8.375763	-306.486664	-289.703949	-290.366669	716.616394	14.12606

Predictions based on forecast start#

Calling predict_raw, we get a df where each 'step' refers to the th step-ahead prediction starting at this row’s datetime. Here,  refers to how many steps ahead the prediction is targeted at.

e.g. step0 is the prediction for this datetime. step1 is the prediction for the next datetime.

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

[5]:

df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)

[5]:

	ds	step0	step1	step2
136	1960-10-01	464.689362	408.547119	440.375763
137	1960-11-01	409.214203	441.038513	459.443207
138	1960-12-01	424.255768	446.244385	455.264343

Note that the last row contains the last possible forecast, forecasting 1961-1-01, 1961-2-01 and 1961-3-01 with data available at 1960-12-01.

Setting decompose=True will include the individual forecast components, which also refer to their respective contrigution to step into the future.

[6]:

df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)

[6]:

	ds	step0	step1	step2	trend0	trend1	trend2	season_yearly0	season_yearly1	season_yearly2	ar0	ar1	ar2
136	1960-10-01	464.689362	408.547119	440.375763	702.886719	709.864075	716.616394	-20.741653	-35.298515	14.126060	-217.455673	-266.018463	-290.366669
137	1960-11-01	409.214203	441.038513	459.443207	709.864075	716.616394	723.593689	-35.298515	14.126060	5.574303	-265.351379	-289.703949	-269.724792
138	1960-12-01	424.255768	446.244385	455.264343	716.616394	723.593689	730.571045	14.126060	5.574303	-30.433420	-306.486664	-282.923584	-244.873322

Collect out-of-sample predictions#

This is how you can extend predictions into the unknown future:

[7]:

df = pd.read_csv(data_location + "air_passengers.csv")
future = m.make_future_dataframe(df, periods=3)  # periods=m.n_forecasts, n_historic_predictions=False

Now, the forecast dataframe only contains predictions about the yet unobserved future.

[8]:

future.tail()

[8]:

	ds	y
3	1960-11-01	390
4	1960-12-01	432
5	1961-01-01	None
6	1961-02-01	None
7	1961-03-01	None

Predictions based on forecast target#

[9]:

forecast = m.predict(future)
forecast.tail(3)

[9]:

	ds	y	yhat1	residual1	yhat2	residual2	yhat3	residual3	ar1	ar2	ar3	trend	season_yearly
5	1961-01-01	NaN	453.751007	NaN	None	NaN	None	NaN	-275.416962	None	None	723.593689	5.574303
6	1961-02-01	NaN	None	NaN	463.336273	NaN	None	NaN	None	-236.801361	None	730.571045	-30.43342
7	1961-03-01	NaN	None	NaN	None	NaN	522.325989	NaN	None	None	-191.955765	736.87323	-22.591486

Predictions based on forecast start#

We can also get the forecasts based on the forecast start. here, each stepX refers to X steps from datestamp ds

[10]:

forecast = m.predict(future, raw=True, decompose=False)
forecast

[10]:

	ds	step0	step1	step2
0	1961-01-01	453.751007	463.336273	522.325989