Running benchmarking experiments¶
Note: The Benchmarking Framework does currently not properly support auto-regression or lagged covariates with multiple step ahead forecasts.
[1]:
if 'google.colab' in str(get_ipython()):
!pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
#!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes
# we also need prophet for this notebook
# !pip install prophet
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
from neuralprophet.benchmark import Dataset, NeuralProphetModel, ProphetModel
from neuralprophet.benchmark import SimpleBenchmark, CrossValidationBenchmark
set_log_level("ERROR")
Importing plotly failed. Interactive plots will not work.
INFO - (NP.benchmark.<module>) - Note: The benchmarking framework is not properly documented.Please help us by reporting any bugs and adding documentation.Multiprocessing is not covered by tests and may break on your device.If you use multiprocessing, only run one benchmark per python script.
Load data¶
[2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')
# retail_sales_df = pd.read_csv(data_location + 'retail_sales.csv')
# yosemite_temps_df = pd.read_csv(data_location + 'yosemite_temps.csv')
# ercot_load_df = pd.read_csv(data_location + 'ERCOT_load.csv')[['ds', 'y']]
0. Configure Datasets and Model Parameters¶
First, we define the datasets that we would like to benchmerk on. Next, we define the models that we want to evaluate and set their hyperparameters.
[3]:
dataset_list = [
Dataset(df = air_passengers_df, name = "air_passengers", freq = "MS"),
# Dataset(df = peyton_manning_df, name = "peyton_manning", freq = "D"),
# Dataset(df = retail_sales_df, name = "retail_sales", freq = "D"),
# Dataset(df = yosemite_temps_df, name = "yosemite_temps", freq = "5min"),
# Dataset(df = ercot_load_df, name = "ercot_load", freq = "H"),
]
model_classes_and_params = [
(NeuralProphetModel, {"seasonality_mode": "multiplicative", "learning_rate": 0.1}),
(ProphetModel, {"seasonality_mode": "multiplicative"})
]
Note: As all the classes used in the Benchmark framework are dataclasses, they have a print function, allowing us to peek into them if we like:
[4]:
model_classes_and_params
[4]:
[(neuralprophet.benchmark.NeuralProphetModel,
{'seasonality_mode': 'multiplicative', 'learning_rate': 0.1}),
(neuralprophet.benchmark.ProphetModel,
{'seasonality_mode': 'multiplicative'})]
1. SimpleBenchmark¶
Setting up a series of Train Test Experiments is quick:
[5]:
benchmark = SimpleBenchmark(
model_classes_and_params=model_classes_and_params, # iterate over this list of tuples
datasets=dataset_list, # iterate over this list
metrics=["MAE", "MSE", "MASE", "RMSE"],
test_percentage=25,
)
results_train, results_test = benchmark.run()
INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -2.35721
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 383.095 0.000197806 75.6156 0.07304 1 124
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
112 383.482 0.000187545 141.645 1.283e-06 0.001 179 LS failed, Hessian reset
159 384.105 0.000328631 165.657 3.031e-06 0.001 277 LS failed, Hessian reset
199 384.233 0.000179066 78.5608 0.4924 0.4924 326
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
299 385.249 0.00016096 76.7537 0.2907 0.2907 446
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
314 385.282 9.21668e-05 100.953 1.111e-06 0.001 501 LS failed, Hessian reset
346 385.294 7.66049e-06 75.7986 9.901e-08 0.001 582 LS failed, Hessian reset
372 385.294 8.2438e-09 78.8225 0.328 0.328 615
Optimization terminated normally:
Convergence detected: absolute parameter change was below tolerance
[6]:
results_test
[6]:
data | model | params | MAE | MSE | MASE | RMSE | |
---|---|---|---|---|---|---|---|
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 24.027794 | 795.101013 | 1.182601 | 28.197536 |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 29.818647 | 1142.139160 | 1.467615 | 33.795551 |
2. CrossValidationBenchmark¶
Setting up a series of crossvalidated experiments is just as simple:
[ ]:
benchmark_cv = CrossValidationBenchmark(
model_classes_and_params=model_classes_and_params, # iterate over this list of tuples
datasets=dataset_list, # iterate over this list
metrics=["MASE", "RMSE"],
test_percentage=10,
num_folds=3,
fold_overlap_pct=0,
)
results_summary, results_train, results_test = benchmark_cv.run()
We now also get a summary DataFrame showing the metrics’ mean and standard deviation over all folds.
[8]:
results_summary
[8]:
data | model | params | MASE | RMSE | MASE_std | RMSE_std | split | |
---|---|---|---|---|---|---|---|---|
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 0.280458 | 7.657820 | 0.011841 | 0.713385 | train |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 0.310869 | 8.616463 | 0.021078 | 1.266764 | train |
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 0.880757 | 21.963869 | 0.244119 | 5.522509 | test |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 0.893797 | 22.778341 | 0.161704 | 4.224042 | test |
[9]:
air_passengers = results_summary[results_summary['data'] == 'air_passengers']
air_passengers = air_passengers[air_passengers['split'] == 'test']
plt = air_passengers.plot(x='model', y='RMSE', kind='barh')

The metrics for each fold are also recoreded individually:
[10]:
results_test
[10]:
data | model | params | MASE | RMSE | |
---|---|---|---|---|---|
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | [0.8573083, 0.5941888, 1.1907747] | [20.638454, 15.961023, 29.292131] |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | [1.0298454, 0.66658664, 0.9849584] | [24.617702, 16.936632, 26.780687] |
3. Manual Benchmark¶
If you need more control over the individual Experiments, you can set them up manually:
[11]:
from neuralprophet.benchmark import SimpleExperiment, CrossValidationExperiment
from neuralprophet.benchmark import ManualBenchmark, ManualCVBenchmark
3.1 ManualBenchmark: Manual SimpleExperiment Benchmark¶
[ ]:
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
peyton_manning_df = pd.read_csv(data_location + 'wp_log_peyton_manning.csv')
metrics = ["MAE", "MSE", "RMSE", "MASE", "RMSSE", "MAPE", "SMAPE"]
experiments = [
SimpleExperiment(
model_class=NeuralProphetModel,
params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
metrics=metrics,
test_percentage=25,
),
SimpleExperiment(
model_class=ProphetModel,
params={"seasonality_mode": "multiplicative", },
data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
metrics=metrics,
test_percentage=25,
),
SimpleExperiment(
model_class=NeuralProphetModel,
params={"learning_rate": 0.1},
data=Dataset(df=peyton_manning_df, name="peyton_manning", freq="D"),
metrics=metrics,
test_percentage=15,
),
SimpleExperiment(
model_class=ProphetModel,
params={},
data=Dataset(df=peyton_manning_df, name="peyton_manning", freq="D"),
metrics=metrics,
test_percentage=15,
),
]
benchmark = ManualBenchmark(
experiments=experiments,
metrics=metrics,
)
results_train, results_test = benchmark.run()
[13]:
results_test
[13]:
data | model | params | MAE | MSE | RMSE | MASE | RMSSE | MAPE | SMAPE | |
---|---|---|---|---|---|---|---|---|---|---|
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 24.005142 | 793.559387 | 28.170187 | 1.181486 | 1.090349 | 5.773936 | 2.777174 |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 29.818647 | 1142.139160 | 33.795551 | 1.467615 | 1.308083 | 7.471930 | 3.558547 |
2 | peyton_manning | NeuralProphet | {'learning_rate': 0.1} | 0.575561 | 0.446854 | 0.668472 | 1.870918 | 1.411006 | 7.394003 | 3.551846 |
3 | peyton_manning | Prophet | {} | 0.597307 | 0.465471 | 0.682255 | 1.941605 | 1.440099 | 7.704831 | 3.697093 |
3.2 ManualCVBenchmark: Manual CrossValidationExperiment Benchmark¶
[ ]:
air_passengers_df = pd.read_csv(data_location + 'air_passengers.csv')
experiments = [
CrossValidationExperiment(
model_class=NeuralProphetModel,
params={"seasonality_mode": "multiplicative", "learning_rate": 0.1},
data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
metrics=metrics,
test_percentage=10,
num_folds=3,
fold_overlap_pct=0,
),
CrossValidationExperiment(
model_class=ProphetModel,
params={"seasonality_mode": "multiplicative", },
data=Dataset(df=air_passengers_df, name="air_passengers", freq="MS"),
metrics=metrics,
test_percentage=10,
num_folds=3,
fold_overlap_pct=0,
),
]
benchmark_cv = ManualCVBenchmark(
experiments=experiments,
metrics=metrics,
)
results_summary, results_train, results_test = benchmark_cv.run()
[15]:
results_summary
[15]:
data | model | params | MAE | MSE | RMSE | MASE | RMSSE | MAPE | SMAPE | MAE_std | MSE_std | RMSE_std | MASE_std | RMSSE_std | MAPE_std | SMAPE_std | split | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 5.996241 | 59.247303 | 7.662074 | 0.281055 | 0.279981 | 3.043633 | 1.511519 | 0.633657 | 10.927888 | 0.734798 | 0.012510 | 0.013638 | 0.079835 | 0.040635 | train |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 6.655735 | 75.848122 | 8.616463 | 0.310869 | 0.313269 | 3.089578 | 1.553327 | 0.952938 | 20.968357 | 1.266764 | 0.021078 | 0.023619 | 0.261420 | 0.132790 | train |
0 | air_passengers | NeuralProphet | {'seasonality_mode': 'multiplicative', 'learni... | 19.368925 | 522.388733 | 22.173418 | 0.894181 | 0.802707 | 4.570522 | 2.310261 | 6.710804 | 258.608643 | 5.543307 | 0.240341 | 0.144602 | 1.391520 | 0.738841 | test |
1 | air_passengers | Prophet | {'seasonality_mode': 'multiplicative'} | 19.052099 | 536.695312 | 22.778341 | 0.893797 | 0.835219 | 4.604149 | 2.272174 | 3.876074 | 182.404541 | 4.224042 | 0.161704 | 0.157025 | 0.710556 | 0.353903 | test |
[ ]: