Open In Colab

Running benchmarking experiments

Note: The Benchmarking Framework does currently not properly support auto-regression or lagged covariates with multiple step ahead forecasts.

[1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes

import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
from neuralprophet.benchmark import Dataset, NeuralProphetModel, ProphetModel
from neuralprophet.benchmark import SimpleBenchmark, CrossValidationBenchmark
set_log_level("ERROR")
WARNING - (NP.benchmark.<module>) - Benchmarking Framework is not covered by tests. Please report any bugs you find.

Load data

[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"

air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')
# retail_sales_df = pd.read_csv(data_location + 'retail_sales.csv')
# yosemite_temps_df = pd.read_csv(data_location +  'yosemite_temps.csv')
# ercot_load_df = pd.read_csv(data_location +  'ERCOT_load.csv')[['ds', 'y']]

0. Configure Datasets and Model Parameters

First, we define the datasets that we would like to benchmerk on. Next, we define the models that we want to evaluate and set their hyperparameters.

[3]:
dataset_list = [
    Dataset(df = air_passengers_df, name = "air_passengers", freq = "MS"),
    # Dataset(df = peyton_manning_df, name = "peyton_manning", freq = "D"),
    # Dataset(df = retail_sales_df, name = "retail_sales", freq = "D"),
    # Dataset(df = yosemite_temps_df, name = "yosemite_temps", freq = "5min"),
    # Dataset(df = ercot_load_df, name = "ercot_load", freq = "H"),
]
model_classes_and_params = [
    (NeuralProphetModel, {"seasonality_mode": "multiplicative", "learning_rate": 0.1}),
    (ProphetModel, {"seasonality_mode": "multiplicative"})
]

Note: As all the classes used in the Benchmark framework are dataclasses, they have a print function, allowing us to peek into them if we like:

[4]:
model_classes_and_params
[4]:
[(neuralprophet.benchmark.NeuralProphetModel,
  {'seasonality_mode': 'multiplicative', 'learning_rate': 0.1}),
 (neuralprophet.benchmark.ProphetModel,
  {'seasonality_mode': 'multiplicative'})]

1. SimpleBenchmark

Setting up a series of Train Test Experiments is quick:

[ ]:
benchmark = SimpleBenchmark(
    model_classes_and_params=model_classes_and_params, # iterate over this list of tuples
    datasets=dataset_list, # iterate over this list
    metrics=["MAE", "MSE", "MASE", "RMSE"],
    test_percentage=25,
)
results_train, results_test = benchmark.run()
[6]:
results_test
[6]:
data model params MAE MSE MASE RMSE
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 24.814017 844.513472 0.571375 29.060514
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 29.818648 1142.139138 0.686614 33.795549

2. CrossValidationBenchmark

Setting up a series of crossvalidated experiments is just as simple:

[ ]:
benchmark_cv = CrossValidationBenchmark(
    model_classes_and_params=model_classes_and_params, # iterate over this list of tuples
    datasets=dataset_list, # iterate over this list
    metrics=["MASE", "RMSE"],
    test_percentage=10,
    num_folds=3,
    fold_overlap_pct=0,
)
results_summary, results_train, results_test = benchmark_cv.run()

We now also get a summary DataFrame showing the metrics’ mean and standard deviation over all folds.

[8]:
results_summary
[8]:
data model params MASE RMSE MASE_std RMSE_std split
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 0.278579 7.664800 0.011388 0.696316 train
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 0.310869 8.616463 0.021078 1.266764 train
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 0.482023 22.380878 0.140785 5.769275 test
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 0.479454 22.778341 0.109195 4.224042 test
[9]:
air_passengers = results_summary[results_summary['data'] == 'air_passengers']
air_passengers = air_passengers[air_passengers['split'] == 'test']
plt = air_passengers.plot(x='model', y='RMSE', kind='barh')
_images/benchmarking_16_0.png

The metrics for each fold are also recoreded individually:

[10]:
results_test
[10]:
data model params MASE RMSE
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... [0.5014669235699923, 0.3006991286697633, 0.643... [20.975299319246627, 16.12341769010618, 30.043...
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} [0.5885898654326461, 0.3302688815416577, 0.519... [24.617703272976527, 16.936631879241148, 26.78...

3. Manual Benchmark

If you need more control over the individual Experiments, you can set them up manually:

[11]:
from neuralprophet.benchmark import SimpleExperiment, CrossValidationExperiment
from neuralprophet.benchmark import ManualBenchmark, ManualCVBenchmark

3.1 ManualBenchmark: Manual SimpleExperiment Benchmark

[ ]:
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')
metrics = ["MAE", "MSE", "RMSE", "MASE", "MSSE", "MAPE", "SMAPE"]
experiments = [
    SimpleExperiment(
        model_class=NeuralProphetModel,
        params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=25,
    ),
    SimpleExperiment(
        model_class=ProphetModel,
        params={"seasonality_mode": "multiplicative", },
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=25,
    ),
    SimpleExperiment(
        model_class=NeuralProphetModel,
        params={"learning_rate": 0.1},
        data=Dataset(df=peyton_manning_df, name="peyton_manning", freq="D"),
        metrics=metrics,
        test_percentage=15,
    ),
    SimpleExperiment(
        model_class=ProphetModel,
        params={},
        data=Dataset(df=peyton_manning_df, name="peyton_manning", freq="D"),
        metrics=metrics,
        test_percentage=15,
    ),
]
benchmark = ManualBenchmark(
    experiments=experiments,
    metrics=metrics,
)
results_train, results_test = benchmark.run()
[13]:
results_test
[13]:
data model params MAE MSE RMSE MASE MSSE MAPE SMAPE
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 23.922920 795.467295 28.204030 0.550857 0.305727 5.749847 2.766594
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 29.818648 1142.139138 33.795549 0.686614 0.438966 7.471930 3.558548
2 peyton_manning NeuralProphet {'learning_rate': 0.1} 0.544561 0.409436 0.639872 1.542943 1.537802 7.006563 3.374985
3 peyton_manning Prophet {} 0.594528 0.463643 0.680913 1.684520 1.741397 7.673804 3.682554

3.2 ManualCVBenchmark: Manual CrossValidationExperiment Benchmark

[ ]:
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
experiments = [
    CrossValidationExperiment(
        model_class=NeuralProphetModel,
        params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=10,
        num_folds=3,
        fold_overlap_pct=0,
    ),
    CrossValidationExperiment(
        model_class=ProphetModel,
        params={"seasonality_mode": "multiplicative", },
        data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
        metrics=metrics,
        test_percentage=10,
        num_folds=3,
        fold_overlap_pct=0,
    ),
]
benchmark_cv = ManualCVBenchmark(
    experiments=experiments,
    metrics=metrics,
)
results_summary, results_train, results_test = benchmark_cv.run()
[15]:
results_summary
[15]:
data model params MAE MSE RMSE MASE MSSE MAPE SMAPE MAE_std MSE_std RMSE_std MASE_std MSSE_std MAPE_std SMAPE_std split
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 6.016460 61.312444 7.806123 0.282338 0.081969 3.053079 1.512873 0.573447 9.319967 0.613913 0.013311 0.009355 0.106123 0.053336 train
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 6.655735 75.848123 8.616463 0.310869 0.098696 3.089578 1.553327 0.952939 20.968356 1.266764 0.021078 0.014689 0.261419 0.132790 train
0 air_passengers NeuralProphet {'seasonality_mode': 'multiplicative', 'learni... 19.514176 528.803367 22.342478 0.482325 0.233347 4.630512 2.340100 6.610696 252.724588 5.442155 0.135160 0.089620 1.386856 0.732939 test
1 air_passengers Prophet {'seasonality_mode': 'multiplicative'} 19.052098 536.695336 22.778341 0.479454 0.249282 4.604149 2.272174 3.876074 182.404522 4.224042 0.109195 0.103208 0.710556 0.353903 test
[ ]: