Open In Colab

Prediction Collection

Collect Predictions

First, let’s fit a vanilla model:

[1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")
[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)
[2]:
ds y
141 1960-10-01 461
142 1960-11-01 390
143 1960-12-01 432
[3]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

Collect in-sample predictions

Predictions sorted based on forecast target

Calling predict, we get a df_forecast where each 'yhat<i>' refers to the <i> -step-ahead prediction for this row’s datetime being the target. Here, <i> refers to the age of the prediction.

e.g. yhat3 is the prediction for this datetime, predicted 3 steps ago, it is “3 steps old”.

Note that the last row 1961-3-01 only has a yhat3, which was forecasted at the last location with data 1960-12-01. Because we lack inputs after that location, we do not have more recent predictions yhat1 from 1961-2-01 nor yhat2 from 1961-1-01.

We also get the individual forecast components, which also refer to their respective contrigution to yhat<i>, forecasted <i> steps ago.

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

[4]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)
[4]:
ds y yhat1 residual1 yhat2 residual2 yhat3 residual3 ar1 ar2 ar3 trend season_yearly
141 1960-10-01 461.0 465.914337 4.914337 471.379517 10.379517 478.984253 17.984253 -198.053421 -192.588257 -184.98349 683.929749 -19.961983
142 1960-11-01 390.0 409.738464 19.738464 410.88266 20.88266 422.372314 32.372314 -246.174759 -245.030563 -233.540924 690.632263 -34.719048
143 1960-12-01 432.0 421.198425 -10.801575 440.99115 8.99115 441.647461 9.647461 -287.63382 -267.841095 -267.184784 697.118591 11.713625

Predictions based on forecast start

Calling predict_raw, we get a df where each 'step<i>' refers to the <i>th step-ahead prediction starting at this row’s datetime. Here, <i> refers to how many steps ahead the prediction is targeted at.

e.g. step0 is the prediction for this datetime. step1 is the prediction for the next datetime.

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

[5]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)
[5]:
ds step0 step1 step2
136 1960-10-01 465.914337 410.882660 441.647461
137 1960-11-01 409.738464 440.991150 458.693176
138 1960-12-01 421.198425 443.388397 456.959534

Note that the last row contains the last possible forecast, forecasting 1961-1-01, 1961-2-01 and 1961-3-01 with data available at 1960-12-01.

Setting decompose=True will include the individual forecast components, which also refer to their respective contrigution to step<i> into the future.

[6]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)
[6]:
ds step0 step1 step2 trend0 trend1 trend2 season_yearly0 season_yearly1 season_yearly2 ar0 ar1 ar2
136 1960-10-01 465.914337 410.882660 441.647461 683.929749 690.632263 697.118591 -19.961983 -34.719048 11.713625 -198.053421 -245.030563 -267.184784
137 1960-11-01 409.738464 440.991150 458.693176 690.632263 697.118591 703.821167 -34.719048 11.713625 3.806945 -246.174759 -267.841095 -248.934937
138 1960-12-01 421.198425 443.388397 456.959534 697.118591 703.821167 710.523743 11.713625 3.806945 -24.743301 -287.633820 -264.239685 -228.820923

Collect out-of-sample predictions

This is how you can extend predictions into the unknown future:

[7]:
df = pd.read_csv(data_location + "air_passengers.csv")

Predictions based on forecast target

[9]:
forecast = m.predict(df)
forecast.tail(3)
[9]:
ds y yhat1 residual1 yhat2 residual2 yhat3 residual3 ar1 ar2 ar3 trend season_yearly
5 1961-01-01 NaN 451.707611 NaN None NaN None NaN -255.920502 None None 703.821167 3.806945
6 1961-02-01 NaN None NaN 465.932037 NaN None NaN None -219.848434 None 710.523743 -24.743301
7 1961-03-01 NaN None NaN None NaN 525.330139 NaN None None -174.258484 716.577637 -16.989017

Predictions based on forecast start

We can also get the forecasts based on the forecast start

[10]:
forecast = m.predict(future, raw=True, decompose=False)
forecast
[10]:
ds step0 step1 step2
0 1961-01-01 451.707611 465.932037 525.330139

Advanced: Get predictions based on forecast start as arrays

This function was not meant to be used directly, but if you have a specific need, it may be useful to get the values directly as arrays. The returned predictions are also based on forecast origin.

… and as an array

[11]:
df = m._prepare_dataframe_to_predict(future)
dates, predicted, components = m._predict_raw(df, include_components=True)
[12]:
dates[-3:]
[12]:
5   1961-01-01
Name: ds, dtype: datetime64[ns]
[13]:
predicted[-3:]
[13]:
array([[451.7076 , 465.93204, 525.33014]], dtype=float32)
[14]:
[(key, values[-3:]) for key, values in components.items()]
[14]:
[('trend', array([[703.82117, 710.52374, 716.57764]], dtype=float32)),
 ('season_yearly',
  array([[  3.806945, -24.743301, -16.989017]], dtype=float32)),
 ('ar', array([[-255.9205 , -219.84843, -174.25848]], dtype=float32))]
[ ]: