Tutorial 10: Validation and Reproducibility#
Validation#
[1]:
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
# Load the dataset from the CSV file using pandas
df = pd.read_csv("https://github.com/ourownstory/neuralprophet-data/raw/main/kaggle-energy/datasets/tutorial01.csv")
# Disable logging messages unless there is an error
set_log_level("ERROR")
# Model and prediction
m = NeuralProphet()
m.set_plotting_backend("plotly-static")
Split our dataset into a train and validation set. We will use the validation set to check the performance of our model. The size of the validation set is 20% of our total dataset. Adapt the size with the parameter valid_p
in split_df
.
[2]:
df_train, df_val = m.split_df(df, valid_p=0.2)
print("Dataset size:", len(df))
print("Train dataset size:", len(df_train))
print("Validation dataset size:", len(df_val))
Dataset size: 1462
Train dataset size: 1170
Validation dataset size: 292
Validation is performed by passing the validation set to the fit method during training. The resulting metrics show the performance of the model compared to our validation set.
[3]:
metrics = m.fit(df_train, validation_df=df_val, progress=None)
metrics
[3]:
MAE_val | RMSE_val | Loss_val | RegLoss_val | epoch | MAE | RMSE | Loss | RegLoss | |
---|---|---|---|---|---|---|---|---|---|
0 | 151.067062 | 159.602341 | 2.067798 | 0.0 | 0 | 75.920654 | 89.007133 | 0.699122 | 0.0 |
1 | 147.524399 | 155.866516 | 2.007845 | 0.0 | 1 | 74.146973 | 86.745255 | 0.676098 | 0.0 |
2 | 143.015457 | 151.105865 | 1.931547 | 0.0 | 2 | 71.729416 | 84.111290 | 0.645402 | 0.0 |
3 | 137.148010 | 144.921494 | 1.832287 | 0.0 | 3 | 68.274185 | 80.496658 | 0.602091 | 0.0 |
4 | 129.434494 | 136.787064 | 1.701819 | 0.0 | 4 | 64.227638 | 75.879417 | 0.549886 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
180 | 7.111052 | 9.061026 | 0.011818 | 0.0 | 180 | 4.582942 | 6.183656 | 0.004233 | 0.0 |
181 | 7.106644 | 9.057316 | 0.011808 | 0.0 | 181 | 4.587008 | 6.228304 | 0.004246 | 0.0 |
182 | 7.100244 | 9.049046 | 0.011786 | 0.0 | 182 | 4.592853 | 6.206255 | 0.004245 | 0.0 |
183 | 7.102000 | 9.050427 | 0.011790 | 0.0 | 183 | 4.603105 | 6.197680 | 0.004274 | 0.0 |
184 | 7.101621 | 9.050205 | 0.011789 | 0.0 | 184 | 4.579907 | 6.184962 | 0.004225 | 0.0 |
185 rows × 9 columns
[4]:
forecast = m.predict(df)
m.plot(forecast)
For advanced validation and testing methods, check out the Test and CrossValidate tutorial in the How to guides section.
Reproducibility#
The variability of results comes from SGD finding different optima on different runs. The majority of the randomness comes from the random initialization of weights, different learning rates and different shuffling of the dataloader. We can control the random number generator by setting it’s seed:
[5]:
from neuralprophet import set_random_seed
set_random_seed(0)
This should lead to identical results every time you run the model. Note that you have to explicitly set the random seed to the same random number each time before fitting the model.