Core Module Documentation¶
- class neuralprophet.forecaster.NeuralProphet(growth='linear', changepoints=None, n_changepoints=10, changepoints_range=0.9, trend_reg=0, trend_reg_threshold=False, yearly_seasonality='auto', weekly_seasonality='auto', daily_seasonality='auto', seasonality_mode='additive', seasonality_reg=0, n_forecasts=1, n_lags=0, num_hidden_layers=0, d_hidden=None, ar_reg=None, learning_rate=None, epochs=None, batch_size=None, loss_func='Huber', optimizer='AdamW', newer_samples_weight=2, newer_samples_start=0.0, impute_missing=True, collect_metrics=True, normalize='auto', global_normalization=False, global_time_normalization=True, unknown_data_normalization=False)¶
NeuralProphet forecaster.
A simple yet powerful forecaster that models: Trend, seasonality, events, holidays, auto-regression, lagged covariates, and future-known regressors. Can be regualrized and configured to model nonlinear relationships.
- Parameters
growth ({'off' or 'linear'}, default 'linear') –
Set use of trend growth type.
- Options:
off
: no trend.(default)
linear
: fits a piece-wise linear trend withn_changepoints + 1
segmentsdiscontinuous
: For advanced users only - not a conventional trend,
allows arbitrary jumps at each trend changepoint
changepoints ({list of str, list of np.datetimes or np.array of np.datetimes}, optional) –
Manually set dates at which to include potential changepoints.
Note
Does not accept
np.array
ofnp.str
. If not specified, potential changepoints are selected automatically.n_changepoints (int) –
Number of potential trend changepoints to include.
Note
Changepoints are selected uniformly from the first
changepoint_range
proportion of the history. Ignored if manualchangepoints
list is supplied.changepoints_range (float) –
Proportion of history in which trend changepoints will be estimated.
e.g. set to 0.8 to allow changepoints only in the first 80% of training data. Ignored if manual
changepoints
list is supplied.trend_reg (float, optional) –
Parameter modulating the flexibility of the automatic changepoint selection.
Note
Large values (~1-100) will limit the variability of changepoints. Small values (~0.001-1.0) will allow changepoints to change faster. default: 0 will fully fit a trend to each segment.
trend_reg_threshold (bool, optional) –
Allowance for trend to change without regularization.
- Options
True
: Automatically set to a value that leads to a smooth trend.(default)
False
: All changes in changepoints are regularized
yearly_seasonality (bool, int) –
Fit yearly seasonality.
- Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
weekly_seasonality (bool, int) –
Fit monthly seasonality.
- Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
daily_seasonality (bool, int) –
Fit daily seasonality.
- Options
True
orFalse
auto
: set automaticallyvalue
: number of Fourier/linear terms to generate
seasonality_mode (str) –
Specifies mode of seasonality
- Options
(default)
additive
multiplicative
seasonality_reg (float, optional) –
Parameter modulating the strength of the seasonality model.
Note
Smaller values (~0.1-1) allow the model to fit larger seasonal fluctuations, larger values (~1-100) dampen the seasonality. default: None, no regularization
n_lags (int) – Previous time series steps to include in auto-regression. Aka AR-order
ar_reg (float, optional) –
how much sparsity to enduce in the AR-coefficients
Note
Large values (~1-100) will limit the number of nonzero coefficients dramatically. Small values (~0.001-1.0) will allow more non-zero coefficients. default: 0 no regularization of coefficients.
n_forecasts (int) – Number of steps ahead of prediction time step to forecast.
num_hidden_layers (int, optional) – number of hidden layer to include in AR-Net (defaults to 0)
d_hidden (int, optional) – dimension of hidden layers of the AR-Net. Ignored if
num_hidden_layers
== 0.learning_rate (float) –
Maximum learning rate setting for 1cycle policy scheduler.
Note
Default
None
: Automatically sets thelearning_rate
based on a learning rate range test. For manual user input, (try values ~0.001-10).epochs (int) –
Number of epochs (complete iterations over dataset) to train model.
Note
Default
None
: Automatically sets the number of epochs based on dataset size. For best results also leave batch_size to None. For manual values, try ~5-500.batch_size (int) –
Number of samples per mini-batch.
If not provided,
batch_size
is approximated based on dataset size. For manual values, try ~8-1024. For best results also leaveepochs
toNone
.newer_samples_weight (float, default 2.0) –
Sets factor by which the model fit is skewed towards more recent observations.
Controls the factor by which final samples are weighted more compared to initial samples. Applies a positional weighting to each sample’s loss value.
e.g.
newer_samples_weight = 2
: final samples are weighted twice as much as initial samples.newer_samples_start (float, default 0.0) –
Sets beginning of ‘newer’ samples as fraction of training data.
Throughout the range of ‘newer’ samples, the weight is increased from
1.0/newer_samples_weight
initially to 1.0 at the end, in a monotonously increasing function (cosine from pi to 2*pi).loss_func (str, torch.nn.functional.loss) –
Type of loss to use:
- Options
(default)
Huber
: Huber loss functionMSE
: Mean Squared Error loss functionMAE
: Mean Absolute Error loss functiontorch.nn.functional.loss.
: loss or callable for custom loss, eg. L1-Loss
Examples
>>> from neuralprophet import NeuralProphet >>> import torch >>> import torch.nn as nn >>> m = NeuralProphet(loss_func=torch.nn.L1Loss)
collect_metrics (list of str, bool) –
Set metrics to compute.
Valid: [
mae
,rmse
,mse
]- Options
(default)
True
: [mae
,rmse
]False
: No metrics
impute_missing (bool) –
whether to automatically impute missing dates/values
Note
imputation follows a linear method up to 10 missing values, more are filled with trend.
normalize (str) –
Type of normalization to apply to the time series.
- Options
off
bypasses data normalization(default, binary timeseries)
minmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviation(default)
soft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9
global_normalization (bool) –
Activation of global normalization
- Options
True
: dict of dataframes is used as global_time_normalization(default)
False
: local normalization
(bool) (global_time_normalization) –
Specifies global time normalization
- Options
(default)
True
: only valid in case of global modeling local normalizationFalse
: set time data_params locally
unknown_data_normalization (bool) –
Specifies unknown data normalization
- Options
True
: test data is normalized with global data params even if trained with local data params (global modeling with local normalization)(default)
False
: no global modeling with local normalization
- add_country_holidays(country_name, lower_window=0, upper_window=0, regularization=None, mode='additive')¶
Add a country into the NeuralProphet object to include country specific holidays and create the corresponding configs such as lower, upper windows and the regularization parameters
- Parameters
country_name (string) – name of the country
lower_window (int) – the lower window for all the country holidays
upper_window (int) – the upper window for all the country holidays
regularization (float) – optional scale for regularization strength
mode (str) –
additive
(default) ormultiplicative
.
- add_events(events, lower_window=0, upper_window=0, regularization=None, mode='additive')¶
Add user specified events and their corresponding lower, upper windows and the regularization parameters into the NeuralProphet object
- Parameters
events (str, list) – name or list of names of user specified events
lower_window (int) – the lower window for the events in the list of events
upper_window (int) – the upper window for the events in the list of events
regularization (float) – optional scale for regularization strength
mode (str) –
additive
(default) ormultiplicative
.
- add_future_regressor(name, regularization=None, normalize='auto', mode='additive')¶
Add a regressor as lagged covariate with order 1 (scalar) or as known in advance (also scalar). The dataframe passed to
fit()
andpredict()
will have a column with the specified name to be used as a regressor. When normalize=True, the regressor will be normalized unless it is binary.- Parameters
name (string) – name of the regressor.
regularization (float) – optional scale for regularization strength
normalize (bool) –
optional, specify whether this regressor will be normalized prior to fitting.
Note
if
auto
, binary regressors will not be normalized.mode (str) –
additive
(default) ormultiplicative
.
- add_lagged_regressor(names, regularization=None, normalize='auto', only_last_value=False)¶
Add a covariate or list of covariate time series as additional lagged regressors to be used for fitting and predicting. The dataframe passed to
fit
andpredict
will have the column with the specified name to be used as lagged regressor. When normalize=True, the covariate will be normalized unless it is binary.- Parameters
names (string or list) – name of the regressor/list of regressors.
regularization (float) – optional scale for regularization strength
normalize (bool) – optional, specify whether this regressor will benormalized prior to fitting. if
auto
, binary regressors will not be normalized.only_last_value (bool) –
specifies last value handling
- Options
(default)
False
use same number of lags as auto-regressionTrue
only use last known value as input
- add_seasonality(name, period, fourier_order)¶
Add a seasonal component with specified period, number of Fourier components, and regularization.
Increasing the number of Fourier components allows the seasonality to change more quickly (at risk of overfitting). Note: regularization and mode (additive/multiplicative) are set in the main init.
- Parameters
name (string) – name of the seasonality component.
period (float) – number of days in one period.
fourier_order (int) – number of Fourier components to use.
- create_df_with_events(df, events_df)¶
Create a concatenated dataframe with the time series data along with the events data expanded.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with all dataevents_df (dict, pd.DataFrame) – containing column
ds
andevent
- Returns
columns
y
,ds
and other user specified events- Return type
dict, pd.DataFrame
- crossvalidation_split_df(df, freq='auto', k=5, fold_pct=0.1, fold_overlap_pct=0.5)¶
Splits timeseries data in k folds for crossvalidation.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.k (int) – number of CV folds
fold_pct (float) – percentage of overall samples to be in each fold
fold_overlap_pct (float) – percentage of overlap between the validation folds.
- Returns
training data
validation data
- Return type
list of k tuples [(df_train, df_val), …]
- double_crossvalidation_split_df(df, freq='auto', k=5, valid_pct=0.1, test_pct=0.1)¶
Splits timeseries data in two sets of k folds for crossvalidation on training and testing data.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.k (int) – number of CV folds
valid_pct (float) – percentage of overall samples to be in validation
test_pct (float) – percentage of overall samples to be in test
- Returns
elements same as
crossvalidation_split_df()
returns- Return type
tuple of k tuples [(folds_val, folds_test), …]
- fit(df, freq='auto', validation_df=None, progress='bar', minimal=False)¶
Train, and potentially evaluate model.
- Parameters
df (pd.DataFrame, dict) – containing column
ds
,y
with all datafreq (str) –
Data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.validation_df (pd.DataFrame, dict) – if provided, model with performance will be evaluated after each training epoch over this data.
epochs (int) – number of epochs to train (overrides default setting). default: if not specified, uses self.epochs
progress (str) –
Method of progress display
- Options
(default)
bar
display updating progress bar (tqdm)print
print out progress (fallback option)plot
plot a live updating graph of the training loss, requires [live] install or livelossplot package installed.plot-all
extended to all recorded metrics.
minimal (bool) – whether to train without any printouts or metrics collection
- Returns
metrics with training and potentially evaluation metrics
- Return type
pd.DataFrame
- highlight_nth_step_ahead_of_each_forecast(step_number=None)¶
Set which forecast step to focus on for metrics evaluation and plotting.
- Parameters
step_number (int) – i-th step ahead forecast to use for statistics and plotting.
- plot(fcst, ax=None, xlabel='ds', ylabel='y', figsize=(10, 6))¶
Plot the NeuralProphet forecast, including history.
- Parameters
fcst (pd.DataFrame) – output of self.predict.
ax (matplotlib axes) – optional, matplotlib axes on which to plot.
xlabel (string) – label name on X-axis
ylabel (string) – label name on Y-axis
figsize (tuple) – width, height in inches. default: (10, 6)
- plot_components(fcst, figsize=None, residuals=False)¶
Plot the NeuralProphet forecast components.
- Parameters
fcst (pd.DataFrame) – output of self.predict
figsize (tuple) –
width, height in inches.
Note
None (default): automatic (10, 3 * npanel)
- Returns
plot of NeuralProphet components
- Return type
matplotlib.axes.Axes
- plot_last_forecast(fcst, ax=None, xlabel='ds', ylabel='y', figsize=(10, 6), include_previous_forecasts=0, plot_history_data=None)¶
Plot the NeuralProphet forecast, including history.
- Parameters
fcst (pd.DataFrame) – output of self.predict.
ax (matplotlib axes) – Optional, matplotlib axes on which to plot.
xlabel (str) – label name on X-axis
ylabel (str) – abel name on Y-axis
figsize (tuple) – width, height in inches. default: (10, 6)
include_previous_forecasts (int) – number of previous forecasts to include in plot
plot_history_data (bool) – specifies plot of historical data
- Returns
plot of NeuralProphet forecasting
- Return type
matplotlib.axes.Axes
- plot_parameters(weekly_start=0, yearly_start=0, figsize=None, df_name=None)¶
Plot the NeuralProphet forecast components.
- Parameters
weekly_start (int) –
specifying the start day of the weekly seasonality plot.
Note
0 (default) starts the week on Sunday. 1 shifts by 1 day to Monday, and so on.
yearly_start (int) –
specifying the start day of the yearly seasonality plot.
Note
0 (default) starts the year on Jan 1. 1 shifts by 1 day to Jan 2, and so on.
df_name (str) – name of dataframe to refer to data params from original keys of train dataframes (used for local normalization in global modeling)
figsize (tuple) –
width, height in inches.
Note
None (default): automatic (10, 3 * npanel)
- Returns
plot of NeuralProphet forecasting
- Return type
matplotlib.axes.Axes
- predict(df, decompose=True, raw=False)¶
Runs the model to make predictions.
Expects all data needed to be present in dataframe. If you are predicting into the unknown future and need to add future regressors or events, please prepare data with make_future_dataframe.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with datadecompose (bool) – whether to add individual components of forecast to the dataframe
raw (bool) –
specifies raw data
- Options
(default)
False
: returns forecasts sorted by target (highlighting forecast age)True
: return the raw forecasts sorted by forecast start date
- Returns
dependent on
raw
Note
raw == True
: columnsds
,y
, and [step<i>
] where step<i> refers to the i-step-ahead prediction made at this row’s datetime, e.g. step3 is the prediction for 3 steps into the future, predicted using information up to (excluding) this datetime.raw == False
: columnsds
,y
,trend
and [yhat<i>
] where yhat<i> refers to the i-step-ahead prediction for this row’s datetime, e.g. yhat3 is the prediction for this datetime, predicted 3 steps ago, “3 steps old”.- Return type
pd.DataFrame
- predict_seasonal_components(df)¶
Predict seasonality components
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing columns
ds
,y
with all data- Returns
seasonal components with columns of name <seasonality component name>
- Return type
pd.DataFrame, dict
- predict_trend(df)¶
Predict only trend component of the model.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with all data- Returns
trend on prediction dates.
- Return type
pd.DataFrame, dict
- set_true_ar_for_eval(true_ar_weights)¶
Configures model to evaluate closeness of AR weights to true weights.
- Parameters
true_ar_weights (np.array) – true AR-parameters, if known.
- split_df(df, freq='auto', valid_p=0.2, local_split=False)¶
Splits timeseries df into train and validation sets.
Prevents leakage of targets. Sharing/Overbleed of inputs can be configured. Also performs basic data checks and fills in missing data.
- Parameters
df (pd.DataFrame, dict) – dataframe or dict of dataframes containing column
ds
,y
with all datafreq (str) –
data step sizes. Frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.valid_p (float) – fraction of data to use for holdout validation set, targets will still never be shared.
local_split (bool) – Each dataframe will be split according to valid_p locally (in case of dict of dataframes
- Returns
training data
validation data
- Return type
tuple of two pd.DataFrames
See also
crossvalidation_split_df
Splits timeseries data in k folds for crossvalidation.
double_crossvalidation_split_df
Splits timeseries data in two sets of k folds for crossvalidation on training and testing data.
Examples
>>> df1 = pd.DataFrame({'ds': pd.date_range(start='2022-12-01', periods=5, ... freq='D'), 'y': [9.59, 8.52, 8.18, 8.07, 7.89]}) >>> df2 = pd.DataFrame({'ds': pd.date_range(start='2022-12-09', periods=5, ... freq='D'), 'y': [8.71, 8.09, 7.84, 7.65, 8.02]}) >>> df3 = pd.DataFrame({'ds': pd.date_range(start='2022-12-09', periods=5, ... freq='D'), 'y': [7.67, 7.64, 7.55, 8.25, 8.3]}) >>> df3 ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55 3 2022-12-12 8.25 4 2022-12-13 8.30
- One can define a dict with many time series.
>>> df_dict = {'data1': df1, 'data2': df2, 'data3': df3}
- You can split a single dataframe.
>>> (df_train, df_val) = m.split_df(df3, valid_p=0.2) >>> df_train ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55 3 2022-12-12 8.25 >>> df_val ds y 0 2022-12-13 8.3
- You can also use a dict of dataframes (especially useful for global modeling), which will account for the time range of the whole group of time series as default.
>>> (df_dict_train, df_dict_val) = m.split_df(df_dict, valid_p=0.2) >>> df_dict_train {'data1': ds y 0 2022-12-01 9.59 1 2022-12-02 8.52 2 2022-12-03 8.18 3 2022-12-04 8.07 4 2022-12-05 7.89, 'data2': ds y 0 2022-12-09 8.71 1 2022-12-10 8.09 2 2022-12-11 7.84, 'data3': ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55} >>> df_dict_val {'data2': ds y 0 2022-12-12 7.65 1 2022-12-13 8.02, 'data3': ds y 0 2022-12-12 8.25 1 2022-12-13 8.30}
- In some applications, splitting locally each time series may be helpful. In this case, one should set local_split to True.
>>> (df_dict_train, df_dict_val) = m.split_df(df_dict, valid_p=0.2, ... local_split=True) >>> df_dict_train {'data1': ds y 0 2022-12-01 9.59 1 2022-12-02 8.52 2 2022-12-03 8.18 3 2022-12-04 8.07, 'data2': ds y 0 2022-12-09 8.71 1 2022-12-10 8.09 2 2022-12-11 7.84 3 2022-12-12 7.65, 'data3': ds y 0 2022-12-09 7.67 1 2022-12-10 7.64 2 2022-12-11 7.55 3 2022-12-12 8.25} >>> df_dict_val {'data1': ds y 0 2022-12-05 7.89, 'data2': ds y 0 2022-12-13 8.02, 'data3': ds y 0 2022-12-13 8.3}
- test(df)¶
Evaluate model on holdout data.
- Parameters
df (pd.DataFrame,dict) – dataframe or dict of dataframes containing column
ds
,y
with with holdout data- Returns
evaluation metrics
- Return type
pd.DataFrame