Open In Colab

Test and CrossValidate

[1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")

Load data

[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
# df = pd.read_csv(data_location + "air_passengers.csv")

1. Basic: Train and Test a model

First, we show how to fit a model and evaluate it on a holdout set.

Note: before making any actual forecasts, re-fit the model on all data available, else you are greatly reducing your forecast accuracy!

1.1 Train-Test evaluation

[3]:
m = NeuralProphet(seasonality_mode= "multiplicative", learning_rate = 0.1)

df = pd.read_csv(data_location + "air_passengers.csv")
df_train, df_test = m.split_df(df=df, freq="MS", valid_p=0.2)

metrics_train = m.fit(df=df_train, freq="MS")
metrics_test = m.test(df=df_test)

metrics_test

[3]:
SmoothL1Loss MAE MSE
0 0.007121 31.413279 1440.266357

1.2 Bonus: Predict into future

Before making any actual forecasts, re-fit the model on all data available, else you are greatly reducing your forecast accuracy!

[4]:
metrics_train2 = m.fit(df=df, freq="MS")
forecast = m.predict(df)
fig = m.plot(forecast)

_images/test_and_crossvalidate_9_1.png

1.3 Bonus: Visualize training

If you installed the [live] version of NeuralProphet, you can additionally visualize your training progress and spot any overfitting by evaluating every epoch.

Note: Again, before making any predictions, re-fit the model with the entire data first.

[5]:
m = NeuralProphet(seasonality_mode="multiplicative", learning_rate=0.1)

df = pd.read_csv(data_location + "air_passengers.csv")
df_train, df_test = m.split_df(df=df, freq="MS", valid_p=0.2)

metrics = m.fit(df=df_train, freq="MS", validation_df=df_test, plot_live_loss=True)
_images/test_and_crossvalidate_11_0.png

[6]:
metrics.tail(1)
[6]:
SmoothL1Loss MAE MSE RegLoss SmoothL1Loss_val MAE_val MSE_val
307 0.000317 6.178366 64.096661 0.0 0.005971 28.745131 1207.533936

2. Time-series Cross-Validation

Time-series crossvalidation is also known as rolling origin backtest. In the first fold, we start with some data to train the model, and evaluate over the next fold_pct data points. In the next fold, the previous evaluation data is added to training, and evaluation starts later (forecast orgin ‘rolls’ forward), again measuring accuracy over the next section of data. We repeat this, until the final folds evaluation data reaches the end of the data.

Note: before making any actual forecasts, re-fit the model on all data available, else you are greatly reducing your forecast accuracy!

[7]:
METRICS = ['SmoothL1Loss', 'MAE', 'MSE']
params = {"seasonality_mode": "multiplicative", "learning_rate": 0.1}

df = pd.read_csv(data_location + "air_passengers.csv")
folds = NeuralProphet(**params).crossvalidation_split_df(df, freq="MS", k=5, fold_pct=0.20, fold_overlap_pct=0.5)
[8]:
metrics_train = pd.DataFrame(columns=METRICS)
metrics_test = pd.DataFrame(columns=METRICS)

for df_train, df_test in folds:
    m = NeuralProphet(**params)
    train = m.fit(df=df_train, freq="MS")
    test = m.test(df=df_test)
    metrics_train = metrics_train.append(train[METRICS].iloc[-1])
    metrics_test = metrics_test.append(test[METRICS].iloc[-1])

[9]:
metrics_test.describe().loc[["mean", "std", "min", "max"]]
[9]:
SmoothL1Loss MAE MSE
mean 0.012356 22.827064 883.100308
std 0.012997 8.211850 508.476321
min 0.003793 10.655480 198.452499
max 0.034445 29.894016 1331.016846

2 Using Benchmark Framework

The benchmark framework is designed for a 2 phase evaluation pipeline. This is in most cases sufficient, particularly when using crossvalidation. For the remainder of this tutorial we will be using normal Train-Test or Cross-Validation evaluation setups.

Note: The Benchmarking Framework does currently not properly support auto-regression or lagged covariates with multiple step ahead forecasts.

[10]:
from neuralprophet.benchmark import Dataset, NeuralProphetModel, SimpleExperiment, CrossValidationExperiment

2.1 SimpleExperiment

Let’s set up a train test experiment:

[ ]:
ts = Dataset(df = pd.read_csv(data_location + "air_passengers.csv"), name = "air_passengers", freq = "MS")
params = {"seasonality_mode": "multiplicative"}
exp = SimpleExperiment(
    model_class=NeuralProphetModel,
    params=params,
    data=ts,
    metrics=["MASE", "RMSE"],
    test_percentage=25,
)
result_train, result_test = exp.run()
[12]:
# result_train
result_test
[12]:
{'data': 'air_passengers',
 'model': 'NeuralProphet',
 'params': "{'seasonality_mode': 'multiplicative'}",
 'MASE': 0.5662454805745694,
 'RMSE': 28.829688840169517}

2.2 CrossValidationExperiment

Let’s crossvalidate:

[ ]:
ts = Dataset(df = pd.read_csv(data_location + "air_passengers.csv"), name = "air_passengers", freq = "MS")
params = {"seasonality_mode": "multiplicative"}
exp_cv = CrossValidationExperiment(
    model_class=NeuralProphetModel,
    params=params,
    data=ts,
    metrics=["MASE", "RMSE"],
    test_percentage=10,
    num_folds=3,
    fold_overlap_pct=0,
  )
result_train, result_test = exp_cv.run()
[14]:
result_test
[14]:
{'data': 'air_passengers',
 'model': 'NeuralProphet',
 'params': "{'seasonality_mode': 'multiplicative'}",
 'MASE': [0.5044070952378811, 0.3214836090858653, 0.6877225952888634],
 'RMSE': [21.220949835639722, 16.892885303566555, 32.749710284868435]}

3. Advanced: 3-Phase Train, Validate and Test procedure

Finally, in 3.1 and 3.2, we will do a 3-part data split to do a proper training, validation and test evaluation of your model. This setup is used if you do not want to bias your performance evaluation by your manual hyperparameter tuning. this is, however not common when working with time series, unless you work in academia. Crossvalidation is usually more than adequate to evaluate your model performance.

If you are confused by this, simply ignore this section and continue your forecasting life. Or if you got curious, read up on how to evaluate machine learning models to level up your skills.

3.1 Train, Validate and Test evaluation

[ ]:
m = NeuralProphet(seasonality_mode= "multiplicative", learning_rate = 0.1)

df = pd.read_csv(data_location + "air_passengers.csv")
# create a test holdout set:
df_train_val, df_test = m.split_df(df=df, freq="MS", valid_p=0.2)
# create a validation holdout set:
df_train, df_val = m.split_df(df=df_train_val, freq="MS", valid_p=0.2)

# fit a model on training data and evaluate on validation set.
metrics_train1 = m.fit(df=df_train, freq="MS")
metrics_val = m.test(df=df_val)

# refit model on training and validation data and evaluate on test set.
metrics_train2 = m.fit(df=df_train_val, freq="MS")
metrics_test = m.test(df=df_test)
[16]:
metrics_train1["split"]  = "train1"
metrics_train2["split"]  = "train2"
metrics_val["split"] = "validate"
metrics_test["split"] = "test"
metrics_train1.tail(1).append([metrics_train2.tail(1), metrics_val, metrics_test]).drop(columns=['RegLoss'])
[16]:
SmoothL1Loss MAE MSE split
324 0.000368 5.348336 46.359198 train1
649 0.000571 6.835680 72.002425 train2
0 0.005073 18.293804 639.260681 validate
0 0.002529 15.051163 318.605957 test

3.2 Train, Cross-Validate and Cross-Test evaluation

[17]:
METRICS = ['SmoothL1Loss', 'MAE', 'MSE']
params = {"seasonality_mode": "multiplicative", "learning_rate": 0.1}

df = pd.read_csv(data_location + "air_passengers.csv")
folds_val, folds_test = NeuralProphet(**params).double_crossvalidation_split_df(df, freq="MS", k=5, valid_pct=0.10, test_pct=0.10)
[ ]:
metrics_train1 = pd.DataFrame(columns=METRICS)
metrics_val = pd.DataFrame(columns=METRICS)
for df_train1, df_val in folds_val:
    m = NeuralProphet(**params)
    train1 = m.fit(df=df_train, freq="MS")
    val = m.test(df=df_val)
    metrics_train1 = metrics_train1.append(train1[METRICS].iloc[-1])
    metrics_val = metrics_val.append(val[METRICS].iloc[-1])

metrics_train2 = pd.DataFrame(columns=METRICS)
metrics_test = pd.DataFrame(columns=METRICS)
for df_train2, df_test in folds_test:
    m = NeuralProphet(**params)
    train2 = m.fit(df=df_train2, freq="MS")
    test = m.test(df=df_test)
    metrics_train2 = metrics_train2.append(train2[METRICS].iloc[-1])
    metrics_test = metrics_test.append(test[METRICS].iloc[-1])
[19]:
metrics_train2.describe().loc[["mean", "std"]]
[19]:
SmoothL1Loss MAE MSE
mean 0.000269 6.724876 74.517579
std 0.000023 0.090767 1.726542
[20]:
metrics_val.describe().loc[["mean", "std"]]
[20]:
SmoothL1Loss MAE MSE
mean 0.009112 29.609479 1148.171173
std 0.007296 13.811697 919.264304
[21]:
metrics_test.describe().loc[["mean", "std"]]
[21]:
SmoothL1Loss MAE MSE
mean 0.001103 14.406253 295.007996
std 0.001310 7.844408 337.153951
[ ]: