Prediction Collection#
First, let’s fit a vanilla model:
[1]:
if "google.colab" in str(get_ipython()):
# uninstall preinstalled packages from Colab to avoid conflicts
!pip uninstall -y torch notebook notebook_shim tensorflow tensorflow-datasets prophet torchaudio torchdata torchtext torchvision
!pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
#!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")
[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)
[2]:
ds | y | |
---|---|---|
141 | 1960-10-01 | 461 |
142 | 1960-11-01 | 390 |
143 | 1960-12-01 | 432 |
[ ]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")
Getting the latest forecast df#
We may get the df of the latest forecast for data analysis.
[4]:
forecast = m.predict(df)
[5]:
df_fc = m.get_latest_forecast(forecast)
df_fc.head(3)
[5]:
ds | y | yhat1 | |
---|---|---|---|
0 | 1960-10-01 | 461.0 | 463.035004 |
1 | 1960-11-01 | 390.0 | 410.434906 |
2 | 1960-12-01 | 432.0 | 439.753998 |
Number of steps before latests forecast could be included. Here we include 5 steps before latest forecast.
[6]:
df_fc = m.get_latest_forecast(forecast, include_previous_forecasts=5)
df_fc.head(3)
[6]:
ds | y | yhat6 | yhat5 | yhat4 | yhat3 | yhat2 | yhat1 | |
---|---|---|---|---|---|---|---|---|
0 | 1960-05-01 | 472.0 | 476.862457 | None | None | None | None | None |
1 | 1960-06-01 | 535.0 | 531.84314 | 527.112732 | None | None | None | None |
2 | 1960-07-01 | 622.0 | 579.601501 | 576.99292 | 590.679077 | None | None | None |
Historical data could be included, however be aware that the df could be large.
[7]:
df_fc = m.get_latest_forecast(forecast, include_history_data=True)
df_fc.head(3)
[7]:
ds | y | yhat1 | |
---|---|---|---|
0 | 1949-01-01 | 112.0 | None |
1 | 1949-02-01 | 118.0 | None |
2 | 1949-03-01 | 132.0 | None |
Collect in-sample predictions#
Predictions sorted based on forecast target#
Calling predict
, we get a df_forecast
where each 'yhat<i>'
refers to the <i>
-step-ahead prediction for this row’s datetime being the target. Here, <i>
refers to the age of the prediction.
e.g. yhat3
is the prediction for this datetime, predicted 3 steps ago, it is “3 steps old”.
Note that the last row 1961-3-01
only has a yhat3
, which was forecasted at the last location with data 1960-12-01
. Because we lack inputs after that location, we do not have more recent predictions yhat1
from 1961-2-01
nor yhat2
from 1961-1-01
.
We also get the individual forecast components, which also refer to their respective contrigution to yhat<i>
, forecasted <i>
steps ago.
Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.
[4]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)
[4]:
ds | y | yhat1 | residual1 | yhat2 | residual2 | yhat3 | residual3 | ar1 | ar2 | ar3 | trend | season_yearly | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
141 | 1960-10-01 | 461.0 | 464.689362 | 3.689362 | 467.748444 | 6.748444 | 474.838562 | 13.838562 | -217.455673 | -214.396606 | -207.306473 | 702.886719 | -20.741653 |
142 | 1960-11-01 | 390.0 | 409.214203 | 19.214203 | 408.547119 | 18.547119 | 417.346649 | 27.346649 | -265.351379 | -266.018463 | -257.218933 | 709.864075 | -35.298515 |
143 | 1960-12-01 | 432.0 | 424.255768 | -7.744232 | 441.038513 | 9.038513 | 440.375763 | 8.375763 | -306.486664 | -289.703949 | -290.366669 | 716.616394 | 14.12606 |
Predictions based on forecast start#
Calling predict_raw
, we get a df
where each 'step<i>'
refers to the <i>
th step-ahead prediction starting at this row’s datetime. Here, <i>
refers to how many steps ahead the prediction is targeted at.
e.g. step0
is the prediction for this datetime. step1
is the prediction for the next datetime.
All the predictions of a particular row were made at the same time: One step before the rows datestamp.
[5]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)
[5]:
ds | step0 | step1 | step2 | |
---|---|---|---|---|
136 | 1960-10-01 | 464.689362 | 408.547119 | 440.375763 |
137 | 1960-11-01 | 409.214203 | 441.038513 | 459.443207 |
138 | 1960-12-01 | 424.255768 | 446.244385 | 455.264343 |
Note that the last row contains the last possible forecast, forecasting 1961-1-01
, 1961-2-01
and 1961-3-01
with data available at 1960-12-01
.
Setting decompose=True
will include the individual forecast components, which also refer to their respective contrigution to step<i>
into the future.
[6]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)
[6]:
ds | step0 | step1 | step2 | trend0 | trend1 | trend2 | season_yearly0 | season_yearly1 | season_yearly2 | ar0 | ar1 | ar2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
136 | 1960-10-01 | 464.689362 | 408.547119 | 440.375763 | 702.886719 | 709.864075 | 716.616394 | -20.741653 | -35.298515 | 14.126060 | -217.455673 | -266.018463 | -290.366669 |
137 | 1960-11-01 | 409.214203 | 441.038513 | 459.443207 | 709.864075 | 716.616394 | 723.593689 | -35.298515 | 14.126060 | 5.574303 | -265.351379 | -289.703949 | -269.724792 |
138 | 1960-12-01 | 424.255768 | 446.244385 | 455.264343 | 716.616394 | 723.593689 | 730.571045 | 14.126060 | 5.574303 | -30.433420 | -306.486664 | -282.923584 | -244.873322 |
Collect out-of-sample predictions#
This is how you can extend predictions into the unknown future:
[7]:
df = pd.read_csv(data_location + "air_passengers.csv")
future = m.make_future_dataframe(df, periods=3) # periods=m.n_forecasts, n_historic_predictions=False
Now, the forecast dataframe only contains predictions about the yet unobserved future.
[8]:
future.tail()
[8]:
ds | y | |
---|---|---|
3 | 1960-11-01 | 390 |
4 | 1960-12-01 | 432 |
5 | 1961-01-01 | None |
6 | 1961-02-01 | None |
7 | 1961-03-01 | None |
Predictions based on forecast target#
[9]:
forecast = m.predict(future)
forecast.tail(3)
[9]:
ds | y | yhat1 | residual1 | yhat2 | residual2 | yhat3 | residual3 | ar1 | ar2 | ar3 | trend | season_yearly | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 1961-01-01 | NaN | 453.751007 | NaN | None | NaN | None | NaN | -275.416962 | None | None | 723.593689 | 5.574303 |
6 | 1961-02-01 | NaN | None | NaN | 463.336273 | NaN | None | NaN | None | -236.801361 | None | 730.571045 | -30.43342 |
7 | 1961-03-01 | NaN | None | NaN | None | NaN | 522.325989 | NaN | None | None | -191.955765 | 736.87323 | -22.591486 |
Predictions based on forecast start#
We can also get the forecasts based on the forecast start. here, each stepX
refers to X steps from datestamp ds
[10]:
forecast = m.predict(future, raw=True, decompose=False)
forecast
[10]:
ds | step0 | step1 | step2 | |
---|---|---|---|---|
0 | 1961-01-01 | 453.751007 | 463.336273 | 522.325989 |