NeuralProphet Class#
- class neuralprophet.forecaster.NeuralProphet(growth: Literal['off', 'linear', 'discontinuous'] = 'linear', changepoints: Optional[list] = None, n_changepoints: int = 10, changepoints_range: float = 0.8, trend_reg: float = 0, trend_reg_threshold: Optional[Union[bool, float]] = False, trend_global_local: str = 'global', trend_local_reg: Optional[Union[bool, float]] = False, yearly_seasonality: Union[Literal['auto'], bool, int] = 'auto', yearly_seasonality_glocal_mode: Union[Literal['auto'], bool, int] = 'auto', weekly_seasonality: Union[Literal['auto'], bool, int] = 'auto', weekly_seasonality_glocal_mode: Union[Literal['auto'], bool, int] = 'auto', daily_seasonality: Union[Literal['auto'], bool, int] = 'auto', daily_seasonality_glocal_mode: Union[Literal['auto'], bool, int] = 'auto', seasonality_mode: Literal['additive', 'multiplicative'] = 'additive', seasonality_reg: float = 0, season_global_local: Literal['global', 'local', 'glocal'] = 'global', seasonality_local_reg: Optional[Union[bool, float]] = False, future_regressors_model: Literal['linear', 'neural_nets', 'shared_neural_nets'] = 'linear', future_regressors_d_hidden: int = 4, future_regressors_num_hidden_layers: int = 2, n_forecasts: int = 1, n_lags: int = 0, ar_layers: Optional[list] = [], ar_reg: Optional[float] = None, lagged_reg_layers: Optional[list] = [], learning_rate: Optional[float] = None, epochs: Optional[int] = None, batch_size: Optional[int] = None, loss_func: Union[str, torch.nn.modules.loss._Loss, Callable] = 'SmoothL1Loss', optimizer: Union[str, Type[torch.optim.optimizer.Optimizer]] = 'AdamW', newer_samples_weight: float = 2, newer_samples_start: float = 0.0, quantiles: List[float] = [], impute_missing: bool = True, impute_linear: int = 10, impute_rolling: int = 10, drop_missing: bool = False, collect_metrics: Union[bool, list, dict] = True, normalize: Literal['auto', 'soft', 'soft1', 'minmax', 'standardize', 'off'] = 'auto', global_normalization: bool = False, global_time_normalization: bool = True, unknown_data_normalization: bool = False, accelerator: Optional[str] = None, trainer_config: dict = {}, prediction_frequency: Optional[dict] = None)#
NeuralProphet forecaster.
A simple yet powerful forecaster that models: Trend, seasonality, events, holidays, auto-regression, lagged covariates, and future-known regressors. Can be regularized and configured to model nonlinear relationships.
- Parameters
growth ({'off' or 'linear'}, default 'linear') –
Set use of trend growth type.
- Options:
off
: no trend.(default)
linear
: fits a piece-wise linear trend withn_changepoints + 1
segmentsdiscontinuous
: For advanced users only - not a conventional trend,
allows arbitrary jumps at each trend changepoint
changepoints ({list of str, list of np.datetimes or np.array of np.datetimes}, optional) –
Manually set dates at which to include potential changepoints.
Note
Does not accept
np.array
ofnp.str
. If not specified, potential changepoints are selected automatically.n_changepoints (int) –
Number of potential trend changepoints to include.
Note
Changepoints are selected uniformly from the first
changepoint_range
proportion of the history. Ignored if manualchangepoints
list is supplied.changepoints_range (float) –
Proportion of history in which trend changepoints will be estimated.
e.g. set to 0.8 to allow changepoints only in the first 80% of training data. Ignored if manual
changepoints
list is supplied.trend_reg (float, optional) –
Parameter modulating the flexibility of the automatic changepoint selection.
Note
Large values (~1-100) will limit the variability of changepoints. Small values (~0.001-1.0) will allow changepoints to change faster. default: 0 will fully fit a trend to each segment.
trend_reg_threshold (bool, optional) –
Allowance for trend to change without regularization.
- Options
True
: Automatically set to a value that leads to a smooth trend.(default)
False
: All changes in changepoints are regularized
trend_global_local (str, default 'global') –
Modelling strategy of the trend when multiple time series are present.
- Options:
global
: All the elements are modelled with the same trend.local
: Each element is modelled with a different trend.
Note
When only one time series is input, this parameter should not be provided. Internally it will be set to
global
, meaning that all the elements(only one in this case) are modelled with the same trend.trend_local_reg (Optional[Union[bool, float]] = False,) –
Parameter to regularize weights to induce similarity between global and local trend
Note
Large values (~100) will limit the variability of changepoints. Small values (~0.001) will allow changepoints to change faster.
yearly_seasonality (bool, int) –
Fit yearly seasonality.
- Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
yearly_seasonality_glocal_mode (bool, str) –
Whether to train the yearly seasonality. Only effective on multiple time series Options
global
local
glocal
weekly_seasonality (bool, int) –
Fit monthly seasonality. Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
weekly_seasonality_glocal_mode (bool, str) –
Whether to train the weekly seasonality. Only effective on multiple time series Options
global
local
glocal
daily_seasonality (bool, int) –
Fit daily seasonality. Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
daily_seasonality_glocal_mode (bool, str) –
Whether to train the daily seasonality. Only effective on multiple time series Options
global
local
glocal
seasonality_mode (str) –
Specifies mode of seasonality
- Options
(default)
additive
multiplicative
seasonality_reg (float, optional) –
Parameter modulating the strength of the seasonality model.
Note
Smaller values (~0.1-1) allow the model to fit larger seasonal fluctuations, larger values (~1-100) dampen the seasonality. default: None, no regularization
season_global_local (str, default 'global') –
Modelling strategy of the general/default seasonality when multiple time series are present. Options:
global
: All the elements are modelled with the same seasonality.local
: Each element is modelled with a different seasonality.
Note
When only one time series is input, this parameter should not be provided. Internally it will be set to
global
, meaning that all the elements(only one in this case) are modelled with the same seasonality.seasonality_local_reg (Optional[Union[bool, float]] = False,) –
Parameter to regularize weights to induce similarity between global and local seasonality
Note
Large values (~100) will limit the variability of changepoints. Small values (~0.001) will allow changepoints to change faster.
future_regressors_model (str) –
- Options
(default)
linear
neural_nets
future_regressors_d_hidden (int) – Number of hidden layers in the neural network model for future regressors. Ignored if
future_regressors_model
islinear
.future_regressors_num_hidden_layers (int) – Dimension of hidden layers in the neural network model for future regressors. Ignored if
future_regressors_model
islinear
.n_lags (int) – Previous time series steps to include in auto-regression. Aka AR-order
ar_reg (float, optional) –
how much sparsity to induce in the AR-coefficients
Note
Large values (~1-100) will limit the number of nonzero coefficients dramatically. Small values (~0.001-1.0) will allow more non-zero coefficients. default: 0 no regularization of coefficients.
ar_layers (list of int, optional) – array of hidden layer dimensions of the AR-Net. Specifies number of hidden layers (number of entries) and layer dimension (list entry).
n_forecasts (int) – Number of steps ahead of prediction time step to forecast.
lagged_reg_layers (list of int, optional) – array of hidden layer dimensions of the Covar-Net. Specifies number of hidden layers (number of entries) and layer dimension (list entry).
learning_rate (float) –
Maximum learning rate setting for 1cycle policy scheduler.
Note
Default
None
: Automatically sets thelearning_rate
based on a learning rate range test. For manual user input, (try values ~0.001-10).epochs (int) –
Number of epochs (complete iterations over dataset) to train model.
Note
Default
None
: Automatically sets the number of epochs based on dataset size. For best results also leave batch_size to None. For manual values, try ~5-500.batch_size (int) –
Number of samples per mini-batch.
If not provided,
batch_size
is approximated based on dataset size. For manual values, try ~8-1024. For best results also leaveepochs
toNone
.newer_samples_weight (float, default 2.0) –
Sets factor by which the model fit is skewed towards more recent observations.
Controls the factor by which final samples are weighted more compared to initial samples. Applies a positional weighting to each sample’s loss value.
e.g.
newer_samples_weight = 2
: final samples are weighted twice as much as initial samples.newer_samples_start (float, default 0.0) –
Sets beginning of ‘newer’ samples as fraction of training data.
Throughout the range of ‘newer’ samples, the weight is increased from
1.0/newer_samples_weight
initially to 1.0 at the end, in a monotonously increasing function (cosine from pi to 2*pi).loss_func (str, torch.nn.functional.loss) –
Type of loss to use:
- Options
(default)
SmoothL1Loss
: SmoothL1 loss functionMSE
: Mean Squared Error loss functionMAE
: Mean Absolute Error loss functiontorch.nn.functional.loss.
: loss or callable for custom loss, eg. L1-Loss
Examples
>>> from neuralprophet import NeuralProphet >>> import torch >>> import torch.nn as nn >>> m = NeuralProphet(loss_func=torch.nn.L1Loss)
collect_metrics (list of str, dict, bool) –
Set metrics to compute.
- Options
(default)
True
: [mae
,rmse
]False
: No metricslist
: Valid options: [mae
,rmse
,mse
]dict
: Collection of names of torchmetrics.Metric objects
Examples
>>> from neuralprophet import NeuralProphet >>> # computer MSE, MAE and RMSE >>> m = NeuralProphet(collect_metrics=["MSE", "MAE", "RMSE"]) >>> # use custorm torchmetrics names >>> m = NeuralProphet(collect_metrics={"MAPE": "MeanAbsolutePercentageError", "MSLE": "MeanSquaredLogError",
quantiles (list, default None) – A list of float values between (0, 1) which indicate the set of quantiles to be estimated.
impute_missing (bool) –
whether to automatically impute missing dates/values
Note
imputation follows a linear method up to 20 missing values, more are filled with trend.
impute_linear (int) – maximal number of missing dates/values to be imputed linearly (default:
10
)impute_rolling (int) – maximal number of missing dates/values to be imputed using rolling average (default:
10
)drop_missing (bool) –
whether to automatically drop missing samples from the data
- Options
(default)
False
: Samples containing NaN values are not dropped.True
: Any sample containing at least one NaN value will be dropped.
normalize (str) –
Type of normalization to apply to the time series.
- Options
off
bypasses data normalization(default, binary timeseries)
minmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviation(default)
soft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9
global_normalization (bool) –
Activation of global normalization
- Options
True
: dict of dataframes is used as global_time_normalization(default)
False
: local normalization
global_time_normalization (bool) –
Specifies global time normalization
- Options
(default)
True
: only valid in case of global modeling local normalizationFalse
: set time data_params locally
unknown_data_normalization (bool) –
Specifies unknown data normalization
- Options
True
: test data is normalized with global data params even if trained with local data params
(global modeling with local normalization) * (default)
False
: no global modeling with local normalization
accelerator (str) – Name of accelerator from pytorch_lightning.accelerators to use for training. Use “auto” to automatically select an available accelerator. Provide None to deactivate the use of accelerators.
trainer_config (dict) – Dictionary of additional trainer configuration parameters.
prediction_frequency (dict) –
periodic interval in which forecasts should be made. More than one item only allowed for {“daily-hour”: x, “weekly-day”: y”} to forecast on a specific hour of a specific day of week.
- Key: str
periodicity of the predictions to be made.
- value: int
forecast origin of the predictions to be made, e.g. 7 for 7am in case of ‘daily-hour’.
- Options
'hourly-minute'
: forecast once per hour at a specified minute'daily-hour'
: forecast once per day at a specified hour'weekly-day'
: forecast once per week at a specified day'monthly-day'
: forecast once per month at a specified day'yearly-month'
: forecast once per year at a specified month
- add_country_holidays(country_name: Union[str, list], lower_window: int = 0, upper_window: int = 0, regularization: Optional[float] = None, mode: str = 'additive')#
Add a country into the NeuralProphet object to include country specific holidays and create the corresponding configs such as lower, upper windows and the regularization parameters
Holidays can only be added for a single country or country list. Calling the function multiple times will override already added country holidays.
- Parameters
country_name (str, list) – name or list of names of the country
lower_window (int) – the lower window for all the country holidays
upper_window (int) – the upper window for all the country holidays
regularization (float) – optional scale for regularization strength
mode (str) –
additive
(default) ormultiplicative
.
- add_events(events: Union[str, List[str]], lower_window: int = 0, upper_window: int = 0, regularization: Optional[float] = None, mode: str = 'additive')#
Add user specified events and their corresponding lower, upper windows and the regularization parameters into the NeuralProphet object
- Parameters
events (str, list) – name or list of names of user specified events
lower_window (int) – the lower window for the events in the list of events
upper_window (int) – the upper window for the events in the list of events
regularization (float) – optional scale for regularization strength
mode (str) –
additive
(default) ormultiplicative
.
- add_future_regressor(name: str, regularization: Optional[float] = None, normalize: Union[str, bool] = 'auto', mode: str = 'additive')#
Add a regressor as lagged covariate with order 1 (scalar) or as known in advance (also scalar).
The dataframe passed to
fit()
andpredict()
will have a column with the specified name to be used as a regressor. When normalize=True, the regressor will be normalized unless it is binary.Note
Future Regressors have to be known for the entire forecast horizon, e.g.
n_forecasts
into the future.- Parameters
name (string) – name of the regressor.
regularization (float) – optional scale for regularization strength
normalize (bool) –
optional, specify whether this regressor will be normalized prior to fitting.
Note
if
auto
, binary regressors will not be normalized.mode (str) –
additive
(default) ormultiplicative
.
- add_lagged_regressor(names: Union[str, List[str]], n_lags: Union[int, Literal['auto', 'scalar']] = 'auto', regularization: Optional[float] = None, normalize: Union[bool, str] = 'auto')#
Add a covariate or list of covariate time series as additional lagged regressors to be used for fitting and predicting. The dataframe passed to
fit
andpredict
will have the column with the specified name to be used as lagged regressor. When normalize=True, the covariate will be normalized unless it is binary.- Parameters
names (string or list) – name of the regressor/list of regressors.
n_lags (int) – previous regressors time steps to use as input in the predictor (covar order) if
auto
, time steps will be equivalent to the AR order (default) ifscalar
, all the regressors will only use last known value as inputregularization (float) – optional scale for regularization strength
normalize (bool) – optional, specify whether this regressor will benormalized prior to fitting. if
auto
, binary regressors will not be normalized.
- add_seasonality(name: str, period: float, fourier_order: int, global_local: str = 'auto', condition_name: Optional[str] = None)#
Add a seasonal component with specified period, number of Fourier components, and regularization.
Increasing the number of Fourier components allows the seasonality to change more quickly (at risk of overfitting). Note: regularization and mode (additive/multiplicative) are set in the main init.
If condition_name is provided, the dataframe passed to fit and predict should have a column with the specified condition_name containing only zeros and ones, deciding when to apply seasonality. Floats between 0 and 1 can be used to apply seasonality partially.
- Parameters
name (string) – name of the seasonality component.
period (float) – number of days in one period.
fourier_order (int) – number of Fourier components to use.
global_local (str) – glocal modelling mode.
condition_name (string) – string name of the seasonality condition.
Examples
Adding a quarterly changing weekly seasonality to the model. First, add columns to df. The columns should contain only zeros and ones (or floats), deciding when to apply seasonality.
>>> df["summer"] = df["ds"].apply(lambda x: x.month in [6, 7, 8]) >>> df["fall"] = df["ds"].apply(lambda x: x.month in [9, 10, 11]) >>> df["winter"] = df["ds"].apply(lambda x: x.month in [12, 1, 2]) >>> df["spring"] = df["ds"].apply(lambda x: x.month in [3, 4, 5]) >>> df.head() ds y summer_week fall_week winter_week spring_week 0 2022-12-01 9.59 0 0 1 0 1 2022-12-02 8.52 0 0 1 0 2 2022-12-03 8.18 0 0 1 0 3 2022-12-04 8.07 0 0 1 0
- As a next step, add the seasonality to the model. With period=7, we specify that the seasonality changes weekly.
>>> m = NeuralProphet(weekly_seasonality=False) >>> m.add_seasonality(name="weekly_summer", period=7, fourier_order=4, condition_name="summer") >>> m.add_seasonality(name="weekly_winter", period=7, fourier_order=4, condition_name="winter") >>> m.add_seasonality(name="weekly_spring", period=7, fourier_order=4, condition_name="spring") >>> m.add_seasonality(name="weekly_fall", period=7, fourier_order=4, condition_name="fall")
- conformal_plot(df: pandas.core.frame.DataFrame, n_highlight: Optional[int] = 1, plotting_backend: Optional[str] = None)#
Plot conformal prediction intervals and quantile regression intervals.
- Parameters
df (pd.DataFrame) – conformal forecast dataframe when
show_all_PI
is set to Truen_highlight (Optional) – i-th step ahead forecast to use for statistics and plotting.
- conformal_predict(df: pandas.core.frame.DataFrame, calibration_df: pandas.core.frame.DataFrame, alpha: Union[float, Tuple[float, float]], method: str = 'naive', plotting_backend: Optional[str] = None, show_all_PI: bool = False, **kwargs) pandas.core.frame.DataFrame #
Apply a given conformal prediction technique to get the uncertainty prediction intervals (or q-hats). Then predict.
- Parameters
df (pd.DataFrame) – test dataframe containing column
ds
,y
, and optionallyID
with datacalibration_df (pd.DataFrame) – holdout calibration dataframe for split conformal prediction
alpha (float or tuple) – user-specified significance level of the prediction interval, float if coverage error spread arbitrarily over left and right tails, tuple of two floats for different coverage error over left and right tails respectively
method (str) –
name of conformal prediction technique used
- Options
(default)
naive
: Naive or Absolute Residualcqr
: Conformalized Quantile Regression
plotting_backend (str) –
specifies the plotting backend for the nonconformity scores plot, if any
- Options
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses the
plotly-resampler package to accelerate visualizing large data by resampling it. For some environments (colab, pycharm interpreter) plotly-resampler might not properly vizualise the figures. In this case, consider switching to ‘plotly-auto’. *
plotly
: Use the plotly backend for plotting *matplotlib
: Use matplotlib backend for plotting * (default) None: Plotting backend ist set automatically. Use plotly with resampling for jupyterlab notebooks and vscode notebooks. Automatically switch to plotly without resampling for all other environments.
show_all_PI (bool) – whether to return all prediction intervals (including quantile regression and conformal prediction)
kwargs (dict) – additional predict parameters for test df
- Returns
test dataframe with the conformal prediction intervals and evaluation dataframe if evaluate set to True
- Return type
pd.DataFrame, Optional[pd.DataFrame]
- create_df_with_events(df: pandas.core.frame.DataFrame, events_df: pandas.core.frame.DataFrame)#
Create a concatenated dataframe with the time series data along with the events data expanded.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all dataevents_df (dict, pd.DataFrame) – containing column
ds
andevent
- Returns
columns
y
,ds
and other user specified events- Return type
dict, pd.DataFrame
- crossvalidation_split_df(df: pandas.core.frame.DataFrame, freq: str = 'auto', k: int = 5, fold_pct: float = 0.1, fold_overlap_pct: float = 0.5, global_model_cv_type: str = 'global-time')#
Splits timeseries data in k folds for crossvalidation.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.k (int) – number of CV folds
fold_pct (float) – percentage of overall samples to be in each fold
fold_overlap_pct (float) – percentage of overlap between the validation folds.
global_model_cv_type (str) –
Type of crossvalidation to apply to the dict of time series.
options:
global-time
(default) crossvalidation is performed according to a timestamp threshold.local
each episode will be crossvalidated locally (may cause time leakage among different episodes)intersect
only the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.
- Returns
training data
validation data
- Return type
list of k tuples [(df_train, df_val), …]
See also
split_df
Splits timeseries df into train and validation sets.
double_crossvalidation_split_df
Splits timeseries data in two sets of k folds for crossvalidation on training and testing data.
Examples
>>> df1 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-01', periods = 10, freq = 'D'), ... 'y': [9.59, 8.52, 8.18, 8.07, 7.89, 8.09, 7.84, 7.65, 8.71, 8.09]}) >>> df2 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-02', periods = 10, freq = 'D'), ... 'y': [8.71, 8.09, 7.84, 7.65, 8.02, 8.52, 8.18, 8.07, 8.25, 8.30]}) >>> df3 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-03', periods = 10, freq = 'D'), ... 'y': [7.67, 7.64, 7.55, 8.25, 8.32, 9.59, 8.52, 7.55, 8.25, 8.09]}) >>> df3 ds y 0 2022-12-03 7.67 1 2022-12-04 7.64 2 2022-12-05 7.55 3 2022-12-06 8.25 4 2022-12-07 8.32 5 2022-12-08 9.59 6 2022-12-09 8.52 7 2022-12-10 7.55 8 2022-12-11 8.25 9 2022-12-12 8.09
- You can create folds for a single dataframe.
>>> folds = m.crossvalidation_split_df(df3, k = 2, fold_pct = 0.2) >>> folds [( ds y 0 2022-12-03 7.67 1 2022-12-04 7.64 2 2022-12-05 7.55 3 2022-12-06 8.25 4 2022-12-07 8.32 5 2022-12-08 9.59 6 2022-12-09 8.52, ds y 0 2022-12-10 7.55 1 2022-12-11 8.25), ( ds y 0 2022-12-03 7.67 1 2022-12-04 7.64 2 2022-12-05 7.55 3 2022-12-06 8.25 4 2022-12-07 8.32 5 2022-12-08 9.59 6 2022-12-09 8.52 7 2022-12-10 7.55, ds y 0 2022-12-11 8.25 1 2022-12-12 8.09)]
- We can also create a df with many IDs.
>>> df1['ID'] = 'data1' >>> df2['ID'] = 'data2' >>> df3['ID'] = 'data3' >>> df = pd.concat((df1, df2, df3))
When using the df with many IDs, there are three types of possible crossvalidation. The default crossvalidation is performed according to a timestamp threshold. In this case, we can have a different number of samples for each time series per fold. This approach prevents time leakage.
>>> folds = m.crossvalidation_split_df(df, k = 2, fold_pct = 0.2)
One can notice how each of the folds has a different number of samples for the validation set. Nonetheless, time leakage does not occur.
>>> folds[0][1] ds y ID 0 2022-12-10 8.09 data1 1 2022-12-10 8.25 data2 2 2022-12-11 8.30 data2 3 2022-12-10 7.55 data3 4 2022-12-11 8.25 data3 >>> folds[1][1] ds y ID 0 2022-12-11 8.30 data2 1 2022-12-11 8.25 data3 2 2022-12-12 8.09 data3
- In some applications, crossvalidating each of the time series locally may be more adequate.
>>> folds = m.crossvalidation_split_df(df, k = 2, fold_pct = 0.2, global_model_cv_type = 'local')
- In this way, we prevent a different number of validation samples in each fold.
>>> folds[0][1] ds y ID 0 2022-12-08 7.65 data1 1 2022-12-09 8.71 data1 2 2022-12-09 8.07 data2 3 2022-12-10 8.25 data2 4 2022-12-10 7.55 data3 5 2022-12-11 8.25 data3 >>> folds[1][1] ds y ID 0 2022-12-09 8.71 data1 1 2022-12-10 8.09 data1 2 2022-12-10 8.25 data2 3 2022-12-11 8.30 data2 4 2022-12-11 8.25 data3 5 2022-12-12 8.09 data3
The last type of global model crossvalidation gets the time intersection among all the time series used. There is no time leakage in this case, and we preserve the same number of samples per fold. The only drawback of this approach is that some of the samples may not be used (those not in the time intersection).
>>> folds = m.crossvalidation_split_df(df, k = 2, fold_pct = 0.2, global_model_cv_type = 'intersect') >>> folds[0][1] ds y ID 0 2022-12-09 8.71 data1 1 2022-12-09 8.07 data2 2 2022-12-09 8.52 data3 0 2022-12-09 8.52} >>> folds[1][1] ds y ID 0 2022-12-10 8.09 data1 1 2022-12-10 8.25 data2 2 2022-12-10 7.55 data3
- double_crossvalidation_split_df(df: pandas.core.frame.DataFrame, freq: str = 'auto', k: int = 5, valid_pct: float = 0.1, test_pct: float = 0.1)#
Splits timeseries data in two sets of k folds for crossvalidation on training and testing data.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.k (int) – number of CV folds
valid_pct (float) – percentage of overall samples to be in validation
test_pct (float) – percentage of overall samples to be in test
- Returns
elements same as
crossvalidation_split_df()
returns- Return type
tuple of k tuples [(folds_val, folds_test), …]
- fit(df: pandas.core.frame.DataFrame, freq: str = 'auto', validation_df: Optional[pandas.core.frame.DataFrame] = None, epochs: Optional[int] = None, batch_size: Optional[int] = None, learning_rate: Optional[float] = None, early_stopping: bool = False, minimal: bool = False, metrics: Optional[Union[Dict, bool]] = None, progress: Optional[str] = 'bar', checkpointing: bool = False, continue_training: bool = False, num_workers: int = 0)#
Train, and potentially evaluate model.
Training/validation metrics may be distorted in case of auto-regression, if a large number of NaN values are present in df and/or validation_df.
- Parameters
df (pd.DataFrame) – containing column
ds
,y
, and optionally``ID`` with all datafreq (str) –
Data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.validation_df (pd.DataFrame, dict) – If provided, model with performance will be evaluated after each training epoch over this data.
epochs (int) – Number of epochs to train for. If None, uses the number of epochs specified in the model config.
batch_size (int) – Batch size for training. If None, uses the batch size specified in the model config.
learning_rate (float) – Learning rate for training. If None, uses the learning rate specified in the model config.
early_stopping (bool) – Flag whether to use early stopping to stop training when training / validation loss is no longer improving.
minimal (bool) – Minimal mode deactivates metrics, the progress bar and checkpointing. Control more granular by using the metrics, progress and checkpointing parameters.
metrics (bool) – Flag whether to collect metrics during training. If None, uses the metrics specified in the model config.
progress (str) –
Flag whether to show a progress bar during training. If None, uses the progress specified in the model config.
Options * (default)
bar
*plot
* Nonecheckpointing (bool) – Flag whether to save checkpoints during training
continue_training (bool) – Flag whether to continue training from the last checkpoint
num_workers (int) – Number of workers for data loading. If 0, data will be loaded in the main process. Note: using multiple workers and therefore distributed training might significantly increase the training time since each batch needs to be copied to each worker for each epoch. Keeping all data on the main process might be faster for most datasets.
- Returns
metrics with training and potentially evaluation metrics
- Return type
pd.DataFrame
- get_latest_forecast(fcst: pandas.core.frame.DataFrame, df_name: Optional[str] = None, include_history_data: bool = False, include_previous_forecasts: int = 0)#
Get the latest NeuralProphet forecast, optional including historical data.
- Parameters
fcst (pd.DataFrame, dict) – output of self.predict.
df_name (str) – ID from time series that should forecast
include_history_data (bool) – specifies whether to include historical data
include_previous_forecasts (int) – specifies how many forecasts before latest forecast to include
- Returns
columns
ds
,y
, and [origin-<i>
]Note
where origin-<i> refers to the (i+1)-th latest prediction for this row’s datetime. e.g. origin-3 is the prediction for this datetime, predicted 4 steps before the last step. The very latest predcition is origin-0.
- Return type
pd.DataFrame
Examples
- We may get the df of the latest forecast:
>>> forecast = m.predict(df) >>> df_forecast = m.get_latest_forecast(forecast)
- Number of steps before latest forecast could be included:
>>> df_forecast = m.get_latest_forecast(forecast, include_previous_forecast=3)
- Historical data could be included, however be aware that the df could be large:
>>> df_forecast = m.get_latest_forecast(forecast, include_history_data=True)
- handle_negative_values(df: pandas.core.frame.DataFrame, handle: Optional[Union[str, int, float]] = 'remove', columns: Optional[List[str]] = None)#
Handle negative values in the given columns. If no column or handling are provided, negative values in all numeric columns are removed.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all datahandling ({str, int, float}, optional) –
specified handling of negative values in the regressor column. Can be one of the following options:
- Options
(default)
remove
: Remove all negative values in the specified columns.error
: Raise an error in case of a negative value.float
orint
: Replace negative values with the provided value.
columns (list of str, optional) – names of the columns to process
- Returns
input df with negative values handled
- Return type
pd.DataFrame
- highlight_nth_step_ahead_of_each_forecast(step_number: Optional[int] = None)#
Set which forecast step to focus on for metrics evaluation and plotting.
- Parameters
step_number (int) –
i-th step ahead forecast to use for statistics and plotting.
Note
Set to None to reset.
- make_future_dataframe(df: pandas.core.frame.DataFrame, events_df: Optional[pandas.core.frame.DataFrame] = None, regressors_df: Optional[pandas.core.frame.DataFrame] = None, periods: Optional[int] = None, n_historic_predictions: Union[bool, int] = False)#
Extends dataframe a number of periods (time steps) into the future.
Only use if you predict into the unknown future. New timestamps are added to the historic dataframe, with the ‘y’ column being NaN, as it remains to be predicted. Further, the given future events and regressors are added to the periods new timestamps. The returned dataframe will include historic data needed to additionally produce n_historic_predictions, for which there are historic observances of the series ‘y’.
- Parameters
df (pd.DataFrame) – History to date. DataFrame containing all columns up to present
events_df (pd.DataFrame) – Future event occurrences corresponding to periods steps into future. Contains columns
ds
andevent
. The event column contains the name of the event.regressor_df (pd.DataFrame) – Future regressor values corresponding to periods steps into future. Contains column
ds
and one column for each of the external regressors.periods (int) – number of steps to extend the DataFrame into the future
n_historic_predictions (bool, int) – Includes historic data needed to predict n_historic_predictions timesteps, for which there are historic observances of the series ‘y’. False: drop historic data except for needed inputs to predict future. True: include entire history.
- Returns
input df with
ds
extended into future,y
set to None, with future events and regressors added.- Return type
pd.DataFrame
Examples
>>> from neuralprophet import NeuralProphet >>> m = NeuralProphet() >>> # set the model to expect these events >>> m = m.add_events(["playoff", "superbowl"]) >>> # create the data df with events >>> history_df = m.create_df_with_events(df, events_df) >>> metrics = m.fit(history_df, freq="D") >>> # forecast with events known ahead >>> future = m.make_future_dataframe( >>> history_df, events_df, periods=365, n_historic_predictions=180 >>> ) >>> # get 180 past and 365 future predictions. >>> forecast = m.predict(df=future)
- plot(fcst: pandas.core.frame.DataFrame, df_name: Optional[str] = None, ax: Optional[matplotlib.axes._axes.Axes] = None, xlabel: str = 'ds', ylabel: str = 'y', figsize: Tuple[int, int] = (10, 6), forecast_in_focus: Optional[int] = None, plotting_backend: Optional[str] = None)#
Plot the NeuralProphet forecast, including history.
- Parameters
fcst (pd.DataFrame) – output of self.predict.
df_name (str) – ID from time series that should be plotted
ax (matplotlib axes) – optional, matplotlib axes on which to plot.
xlabel (string) – label name on X-axis
ylabel (string) – label name on Y-axis
figsize (tuple) – width, height in inches. default: (10, 6)
plotting_backend (str) –
optional, overwrites the default plotting backend.
Options *
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses theplotly-resampler package to accelerate visualizing large data by resampling it. For some environments (colab, pycharm interpreter) plotly-resampler might not properly vizualise the figures. In this case, consider switching to ‘plotly-auto’.
plotly
: Use the plotly backend for plottingplotly-static
: Use the plotly backend to generate static svgmatplotlib
: use matplotlib for plotting- (default) None: Plotting backend ist set automatically. Use plotly with resampling for jupyterlab
notebooks and vscode notebooks. Automatically switch to plotly without resampling for all other environments.
forecast_in_focus (int) –
optinal, i-th step ahead forecast to plot
Note
None (default): plot self.highlight_forecast_step_n by default
- plot_components(fcst: pandas.core.frame.DataFrame, df_name: str = '__df__', figsize: Optional[Tuple[int, int]] = None, forecast_in_focus: Optional[int] = None, plotting_backend: Optional[str] = None, components: Union[None, str, List[str]] = None, one_period_per_season: bool = False)#
Plot the NeuralProphet forecast components.
- Parameters
fcst (pd.DataFrame) – output of self.predict
df_name (str) – ID from time series that should be plotted
figsize (tuple) –
width, height in inches.
Note
None (default): automatic (10, 3 * npanel)
forecast_in_focus (int) –
optinal, i-th step ahead forecast to plot
Note
None (default): plot self.highlight_forecast_step_n by default
plotting_backend (str) –
optional, overwrites the default plotting backend.
Options *
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses theplotly-resampler package to accelerate visualizing large data by resampling it. For some environments (colab, pycharm interpreter) plotly-resampler might not properly vizualise the figures. In this case, consider switching to ‘plotly-auto’.
plotly
: Use the plotly backend for plottingplotly-static
: Use the plotly backend to generate static svgmatplotlib
: use matplotlib for plotting- (default) None: Plotting backend ist set automatically. Use plotly with resampling for jupyterlab
notebooks and vscode notebooks. Automatically switch to plotly without resampling for all other environments.
components (str or list, optional) –
name or list of names of components to plot
(default)``None``: All components the user set in the model configuration are plotted.
trend
seasonality
: select all seasonalitiesautoregression
lagged_regressors
: select all lagged regressorsfuture_regressors
: select all future regressorsevents
: select all events and country holidaysuncertainty
one_period_per_season (bool) – Plot one period per season, instead of the true seasonal components of the forecast.
- Returns
plot of NeuralProphet components
- Return type
matplotlib.axes.Axes
- plot_latest_forecast(fcst: pandas.core.frame.DataFrame, df_name: Optional[str] = None, ax: Optional[matplotlib.axes._axes.Axes] = None, xlabel: str = 'ds', ylabel: str = 'y', figsize: Tuple[int, int] = (10, 6), include_previous_forecasts: int = 0, plot_history_data: Optional[bool] = None, plotting_backend: Optional[str] = None)#
Plot the latest NeuralProphet forecast(s), including history.
- Parameters
fcst (pd.DataFrame) – output of self.predict.
df_name (str) – ID from time series that should be plotted
ax (matplotlib axes) – Optional, matplotlib axes on which to plot.
xlabel (str) – label name on X-axis
ylabel (str) – abel name on Y-axis
figsize (tuple) – width, height in inches. default: (10, 6)
include_previous_forecasts (int) – number of previous forecasts to include in plot
plot_history_data (bool) – specifies plot of historical data
plotting_backend (str) –
optional, overwrites the default plotting backend.
Options *
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses theplotly-resampler package to accelerate visualizing large data by resampling it. For some environments (colab, pycharm interpreter) plotly-resampler might not properly vizualise the figures. In this case, consider switching to ‘plotly-auto’.
plotly
: Use the plotly backend for plottingplotly-static
: Use the plotly backend to generate static svgmatplotlib
: use matplotlib for plotting
- ** (default) None: Plotting backend ist set automatically. Use plotly with resampling for jupyterlab
notebooks and vscode notebooks. Automatically switch to plotly without resampling for all other environments.
(default) None
- Returns
plot of NeuralProphet forecasting
- Return type
matplotlib.axes.Axes
- plot_parameters(weekly_start: int = 0, yearly_start: int = 0, figsize: Optional[Tuple[int, int]] = None, forecast_in_focus: Optional[int] = None, df_name: Optional[str] = None, plotting_backend: Optional[str] = None, quantile: Optional[float] = None, components: Union[None, str, List[str]] = None)#
Plot the NeuralProphet forecast components.
- Parameters
weekly_start (int) –
specifying the start day of the weekly seasonality plot.
Note
0 (default) starts the week on Sunday. 1 shifts by 1 day to Monday, and so on.
yearly_start (int) –
specifying the start day of the yearly seasonality plot.
Note
0 (default) starts the year on Jan 1. 1 shifts by 1 day to Jan 2, and so on.
df_name (str) – name of dataframe to refer to data params from original keys of train dataframes (used for local normalization in global modeling)
figsize (tuple) –
width, height in inches.
Note
None (default): automatic (10, 3 * npanel)
forecast_in_focus (int) –
optinal, i-th step ahead forecast to plot
Note
None (default): plot self.highlight_forecast_step_n by default
plotting_backend (str) –
optional, overwrites the default plotting backend.
Options *
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses theplotly-resampler package to accelerate visualizing large data by resampling it. For some environments (colab, pycharm interpreter) plotly-resampler might not properly vizualise the figures. In this case, consider switching to ‘plotly-auto’.
plotly
: Use the plotly backend for plottingplotly-static
: Use the plotly backend to generate static svgmatplotlib
: use matplotlib for plotting- (default) None: Plotting backend ist set automatically. Use plotly with resampling for jupyterlab
notebooks and vscode notebooks. Automatically switch to plotly without resampling for all other environments.
Note
For multiple time series and local modeling of at least one component, the df_name parameter is required.
quantile (float) –
The quantile for which the model parameters are to be plotted
Note
None (default): Parameters will be plotted for the median quantile.
components (str or list, optional) –
name or list of names of parameters to plot
- Options
(default)
None
: All parameter the user set in the model configuration are plotted.trend
trend_rate_change
seasonality
: : select all seasonalitiesautoregression
lagged_regressors
: select all lagged regressorsevents
: select all events and country holidaysfuture_regressors
: select all future regressors
- Returns
plot of NeuralProphet forecasting
- Return type
matplotlib.axes.Axes
- predict(df: pandas.core.frame.DataFrame, decompose: bool = True, raw: bool = False)#
Runs the model to make predictions.
Expects all data needed to be present in dataframe. If you are predicting into the unknown future and need to add future regressors or events, please prepare data with make_future_dataframe.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with datadecompose (bool) – whether to add individual components of forecast to the dataframe
raw (bool) –
specifies raw data
- Options
(default)
False
: returns forecasts sorted by target (highlighting forecast age)True
: return the raw forecasts sorted by forecast start date
- Returns
dependent on
raw
Note
raw == True
: columnsds
,y
, and [step<i>
] where step<i> refers to the i-step-ahead prediction made at this row’s datetime, e.g. step3 is the prediction for 3 steps into the future, predicted using information up to (excluding) this datetime.raw == False
: columnsds
,y
,trend
and [yhat<i>
] where yhat<i> refers to the i-step-ahead prediction for this row’s datetime, e.g. yhat3 is the prediction for this datetime, predicted 3 steps ago, “3 steps old”.- Return type
pd.DataFrame
- predict_seasonal_components(df: pandas.core.frame.DataFrame, quantile: float = 0.5)#
Predict seasonality components
- Parameters
df (pd.DataFrame) – dataframe containing columns
ds
,y
, and optionally``ID`` with all dataquantile (float) – the quantile in (0, 1) that needs to be predicted
- Returns
seasonal components with columns of name <seasonality component name>
- Return type
pd.DataFrame, dict
- predict_trend(df: pandas.core.frame.DataFrame, quantile: float = 0.5)#
Predict only trend component of the model.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all dataquantile (float) – the quantile in (0, 1) that needs to be predicted
- Returns
trend on prediction dates.
- Return type
pd.DataFrame, dict
- restore_trainer(accelerator: Optional[str] = None)#
If no accelerator was provided, use accelerator stored in model.
- set_plotting_backend(plotting_backend: str)#
Set plotting backend.
- Parameters
plotting_backend (str) –
plot. (Specifies plotting backend to use for all plots. Can be configured individually for each) –
Options –
plotly-resampler
: Use the plotly backend for plotting in resample mode. This mode uses theplotly-resampler package to accelerate visualizing large data by resampling it. Only supported for jupyterlab notebooks and vscode notebooks.
plotly
: Use the plotly backend for plottingplotly-static
: Use the plotly backend to generate static svgmatplotlib
: use matplotlib for plotting
- set_true_ar_for_eval(true_ar_weights: numpy.ndarray)#
Configures model to evaluate closeness of AR weights to true weights.
- Parameters
true_ar_weights (np.array) – true AR-parameters, if known.
- split_df(df: pandas.core.frame.DataFrame, freq: str = 'auto', valid_p: float = 0.2, local_split: bool = False)#
Splits timeseries df into train and validation sets. Prevents leakage of targets. Sharing/Overbleed of inputs can be configured. Also performs basic data checks and fills in missing data, unless impute_missing is set to
False
.- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.valid_p (float) – fraction of data to use for holdout validation set, targets will still never be shared.
local_split (bool) – Each dataframe will be split according to valid_p locally (in case of dict of dataframes
- Returns
training data
validation data
- Return type
tuple of two pd.DataFrames
See also
crossvalidation_split_df
Splits timeseries data in k folds for crossvalidation.
double_crossvalidation_split_df
Splits timeseries data in two sets of k folds for crossvalidation on training and testing data.
Examples
>>> df1 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-01', periods = 5, ... freq='D'), 'y': [9.59, 8.52, 8.18, 8.07, 7.89]}) >>> df2 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-09', periods = 5, ... freq='D'), 'y': [8.71, 8.09, 7.84, 7.65, 8.02]}) >>> df3 = pd.DataFrame({'ds': pd.date_range(start = '2022-12-09', periods = 5, ... freq='D'), 'y': [7.67, 7.64, 7.55, 8.25, 8.3]}) >>> df3 ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55 3 2022-12-12 8.25 4 2022-12-13 8.30
You can split a single dataframe, which also may contain NaN values. Please be aware this may affect training/validation performance.
>>> (df_train, df_val) = m.split_df(df3, valid_p = 0.2) >>> df_train ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55 3 2022-12-12 8.25 >>> df_val ds y 0 2022-12-13 8.3
- One can define a single df with many time series identified by an ‘ID’ column.
>>> df1['ID'] = 'data1' >>> df2['ID'] = 'data2' >>> df3['ID'] = 'data3' >>> df = pd.concat((df1, df2, df3))
You can use a df with many IDs (especially useful for global modeling), which will account for the time range of the whole group of time series as default.
>>> (df_train, df_val) = m.split_df(df, valid_p = 0.2) >>> df_train ds y ID 0 2022-12-01 9.59 data1 1 2022-12-02 8.52 data1 2 2022-12-03 8.18 data1 3 2022-12-04 8.07 data1 4 2022-12-05 7.89 data1 5 2022-12-09 8.71 data2 6 2022-12-10 8.09 data2 7 2022-12-11 7.84 data2 8 2022-12-09 7.67 data3 9 2022-12-10 7.64 data3 10 2022-12-11 7.55 data3 >>> df_val ds y ID 0 2022-12-12 7.65 data2 1 2022-12-13 8.02 data2 2 2022-12-12 8.25 data3 3 2022-12-13 8.30 data3
In some applications, splitting locally each time series may be helpful. In this case, one should set local_split to True.
>>> (df_train, df_val) = m.split_df(df, valid_p = 0.2, local_split = True) >>> df_train ds y ID 0 2022-12-01 9.59 data1 1 2022-12-02 8.52 data1 2 2022-12-03 8.18 data1 3 2022-12-04 8.07 data1 4 2022-12-09 8.71 data2 5 2022-12-10 8.09 data2 6 2022-12-11 7.84 data2 7 2022-12-12 7.65 data2 8 2022-12-09 7.67 data3 9 2022-12-10 7.64 data3 10 2022-12-11 7.55 data3 11 2022-12-12 8.25 data3 >>> df_val ds y ID 0 2022-12-05 7.89 data1 1 2022-12-13 8.02 data2 2 2022-12-13 8.30 data3
- test(df: pandas.core.frame.DataFrame, verbose: bool = True)#
Evaluate model on holdout data.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with with holdout dataverbose (bool) – If True, prints the test results.
- Returns
evaluation metrics
- Return type
pd.DataFrame