Open In Colab

Prediction Collection#

First, let’s fit a vanilla model:

[1]:
if "google.colab" in str(get_ipython()):
    # uninstall preinstalled packages from Colab to avoid conflicts
    !pip uninstall -y torch notebook notebook_shim tensorflow tensorflow-datasets prophet torchaudio torchdata torchtext torchvision
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level

set_log_level("ERROR")
[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)
[2]:
ds y
141 1960-10-01 461
142 1960-11-01 390
143 1960-12-01 432
[ ]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

Getting the latest forecast df#

We may get the df of the latest forecast for data analysis.

[4]:
forecast = m.predict(df)
[5]:
df_fc = m.get_latest_forecast(forecast)
df_fc.head(3)
[5]:
ds y yhat1
0 1960-10-01 461.0 463.035004
1 1960-11-01 390.0 410.434906
2 1960-12-01 432.0 439.753998

Number of steps before latests forecast could be included. Here we include 5 steps before latest forecast.

[6]:
df_fc = m.get_latest_forecast(forecast, include_previous_forecasts=5)
df_fc.head(3)
[6]:
ds y yhat6 yhat5 yhat4 yhat3 yhat2 yhat1
0 1960-05-01 472.0 476.862457 None None None None None
1 1960-06-01 535.0 531.84314 527.112732 None None None None
2 1960-07-01 622.0 579.601501 576.99292 590.679077 None None None

Historical data could be included, however be aware that the df could be large.

[7]:
df_fc = m.get_latest_forecast(forecast, include_history_data=True)
df_fc.head(3)
[7]:
ds y yhat1
0 1949-01-01 112.0 None
1 1949-02-01 118.0 None
2 1949-03-01 132.0 None

Collect in-sample predictions#

Predictions sorted based on forecast target#

Calling predict, we get a df_forecast where each 'yhat<i>' refers to the <i> -step-ahead prediction for this row’s datetime being the target. Here, <i> refers to the age of the prediction.

e.g. yhat3 is the prediction for this datetime, predicted 3 steps ago, it is “3 steps old”.

Note that the last row 1961-3-01 only has a yhat3, which was forecasted at the last location with data 1960-12-01. Because we lack inputs after that location, we do not have more recent predictions yhat1 from 1961-2-01 nor yhat2 from 1961-1-01.

We also get the individual forecast components, which also refer to their respective contrigution to yhat<i>, forecasted <i> steps ago.

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

[4]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)
[4]:
ds y yhat1 residual1 yhat2 residual2 yhat3 residual3 ar1 ar2 ar3 trend season_yearly
141 1960-10-01 461.0 464.689362 3.689362 467.748444 6.748444 474.838562 13.838562 -217.455673 -214.396606 -207.306473 702.886719 -20.741653
142 1960-11-01 390.0 409.214203 19.214203 408.547119 18.547119 417.346649 27.346649 -265.351379 -266.018463 -257.218933 709.864075 -35.298515
143 1960-12-01 432.0 424.255768 -7.744232 441.038513 9.038513 440.375763 8.375763 -306.486664 -289.703949 -290.366669 716.616394 14.12606

Predictions based on forecast start#

Calling predict_raw, we get a df where each 'step<i>' refers to the <i>th step-ahead prediction starting at this row’s datetime. Here, <i> refers to how many steps ahead the prediction is targeted at.

e.g. step0 is the prediction for this datetime. step1 is the prediction for the next datetime.

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

[5]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)
[5]:
ds step0 step1 step2
136 1960-10-01 464.689362 408.547119 440.375763
137 1960-11-01 409.214203 441.038513 459.443207
138 1960-12-01 424.255768 446.244385 455.264343

Note that the last row contains the last possible forecast, forecasting 1961-1-01, 1961-2-01 and 1961-3-01 with data available at 1960-12-01.

Setting decompose=True will include the individual forecast components, which also refer to their respective contrigution to step<i> into the future.

[6]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)
[6]:
ds step0 step1 step2 trend0 trend1 trend2 season_yearly0 season_yearly1 season_yearly2 ar0 ar1 ar2
136 1960-10-01 464.689362 408.547119 440.375763 702.886719 709.864075 716.616394 -20.741653 -35.298515 14.126060 -217.455673 -266.018463 -290.366669
137 1960-11-01 409.214203 441.038513 459.443207 709.864075 716.616394 723.593689 -35.298515 14.126060 5.574303 -265.351379 -289.703949 -269.724792
138 1960-12-01 424.255768 446.244385 455.264343 716.616394 723.593689 730.571045 14.126060 5.574303 -30.433420 -306.486664 -282.923584 -244.873322

Collect out-of-sample predictions#

This is how you can extend predictions into the unknown future:

[7]:
df = pd.read_csv(data_location + "air_passengers.csv")
future = m.make_future_dataframe(df, periods=3)  # periods=m.n_forecasts, n_historic_predictions=False

Now, the forecast dataframe only contains predictions about the yet unobserved future.

[8]:
future.tail()
[8]:
ds y
3 1960-11-01 390
4 1960-12-01 432
5 1961-01-01 None
6 1961-02-01 None
7 1961-03-01 None

Predictions based on forecast target#

[9]:
forecast = m.predict(future)
forecast.tail(3)
[9]:
ds y yhat1 residual1 yhat2 residual2 yhat3 residual3 ar1 ar2 ar3 trend season_yearly
5 1961-01-01 NaN 453.751007 NaN None NaN None NaN -275.416962 None None 723.593689 5.574303
6 1961-02-01 NaN None NaN 463.336273 NaN None NaN None -236.801361 None 730.571045 -30.43342
7 1961-03-01 NaN None NaN None NaN 522.325989 NaN None None -191.955765 736.87323 -22.591486

Predictions based on forecast start#

We can also get the forecasts based on the forecast start. here, each stepX refers to X steps from datestamp ds

[10]:
forecast = m.predict(future, raw=True, decompose=False)
forecast
[10]:
ds step0 step1 step2
0 1961-01-01 453.751007 463.336273 522.325989