Core Module Documentation

class neuralprophet.df_utils.ShiftScale(shift: float = 0.0, scale: float = 1.0)
neuralprophet.df_utils.add_missing_dates_nan(df, freq)

Fills missing datetimes in ‘ds’, with NaN for all other columns

Parameters
  • df (pd.Dataframe) – with column ‘ds’ datetimes

  • freq (str) – Data step sizes. Frequency of data recording, Any valid frequency for pd.date_range, such as ‘D’ or ‘M’

Returns

dataframe without date-gaps but nan-values

neuralprophet.df_utils.check_dataframe(df, check_y=True, covariates=None, regressors=None, events=None)

Performs basic data sanity checks and ordering

Prepare dataframe for fitting or predicting. :param df: with columns ds :type df: pd.DataFrame or list of pd.DataFrame :param check_y: if df must have series values

set to True if training or predicting with autoregression

Parameters
  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

Returns

pd.DataFrame or list of pd.DataFrame

neuralprophet.df_utils.convert_events_to_features(df, events_config, events_df)

Converts events information into binary features of the df

Parameters
  • df (pandas DataFrame) – Dataframe with columns ‘ds’ datestamps and ‘y’ time series values

  • events_config (OrderedDict) – User specified events configs

  • events_df (pd.DataFrame) – containing column ‘ds’ and ‘event’

Returns

input df with columns for user_specified features

Return type

df (pd.DataFrame)

neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0)

Splits data in k folds for crossvalidation.

Parameters
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProhet

  • n_forecasts (int) – identical to NeuralProhet

  • k (int) – number of CV folds

  • fold_pct (float) – percentage of overall samples to be in each fold

  • fold_overlap_pct (float) – percentage of overlap between the validation folds. default: 0.0

Returns

df_train (pd.DataFrame): training data df_val (pd.DataFrame): validation data

Return type

list of k tuples [(df_train, df_val), …] where

neuralprophet.df_utils.data_params_definition(df, normalize, covariates_config=None, regressor_config=None, events_config=None)

Initialize data scaling values.

Note: We do a z normalization on the target series ‘y’,

unlike OG Prophet, which does shift by min and scale by max.

Parameters
  • df (pd.DataFrame) – Time series to compute normalization parameters from.

  • normalize (str) –

    Type of normalization to apply to the time series. options: [ ‘off’, ‘minmax, ‘standardize’, ‘soft’, ‘soft1’] default: ‘soft’, unless the time series is binary, in which case ‘minmax’ is applied.

    ’off’ bypasses data normalization ‘minmax’ scales the minimum value to 0.0 and the maximum value to 1.0 ‘standardize’ zero-centers and divides by the standard deviation ‘soft’ scales the minimum value to 0.0 and the 95th quantile to 1.0 ‘soft1’ scales the minimum value to 0.1 and the 90th quantile to 0.9

  • covariates_config (OrderedDict) – extra regressors with sub_parameters normalize (bool)

  • regressor_config (OrderedDict) – extra regressors (with known future values) with sub_parameters normalize (bool)

  • events_config (OrderedDict) – user specified events configs

Returns

scaling values

with ShiftScale entries containing ‘shift’ and ‘scale’ parameters

Return type

data_params (OrderedDict)

neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)

Splits data in two sets of k folds for crossvalidation on validation and test data.

Parameters
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProhet

  • n_forecasts (int) – identical to NeuralProhet

  • k (int) – number of CV folds

  • valid_pct (float) – percentage of overall samples to be in validation

  • test_pct (float) – percentage of overall samples to be in test

Returns

tuple of folds_val, folds_test, where each are same as crossvalidation_split_df returns

neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)

Adds missing dates, fills missing values with linear imputation or trend.

Parameters
  • series (pd.Series) – series with nan to be filled in.

  • limit_linear (int) – maximum number of missing values to impute. Note: because imputation is done in both directions, this value is effectively doubled.

  • rolling (int) – maximal number of missing values to impute. Note: window width is rolling + 2*limit_linear

Returns

filled df

neuralprophet.df_utils.init_data_params(df, normalize, covariates_config=None, regressor_config=None, events_config=None, local_modeling=False)

Initialize data scaling values.

Note: We do a z normalization on the target series ‘y’,

unlike OG Prophet, which does shift by min and scale by max.

Parameters
  • df (pd.DataFrame or list of pd.Dataframe) – Time series to compute normalization parameters from.

  • normalize (str) – Type of normalization to apply to the time series. options: [‘soft’, ‘off’, ‘minmax, ‘standardize’] default: ‘soft’ scales minimum to 0.1 and the 90th quantile to 0.9

  • covariates_config (OrderedDict) – extra regressors with sub_parameters normalize (bool)

  • regressor_config (OrderedDict) – extra regressors (with known future values) with sub_parameters normalize (bool)

  • events_config (OrderedDict) – user specified events configs

  • local_modeling (bool) – when set to true each episode from list of dataframes will be considered

  • locally (i.e. seasonality, data_params, normalization) –

Returns

scaling values

with ShiftScale entries containing ‘shift’ and ‘scale’ parameters

Return type

data_params (OrderedDict or list of OrderedDict)

neuralprophet.df_utils.join_dataframes(df_list)

Join list of dataframes preserving the episodes so it can be recovered later.

Parameters

df_list (list of df (pd.DataFrame) – containing column ‘ds’, ‘y’ with training data)

Returns

Dataframe with concatenated episodes episodes: list containing episodes of each timestamp

Return type

df_joined

neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, events_config=None, events_df=None, regressor_config=None, regressors_df=None)

Extends df periods number steps into future.

Parameters
  • df_columns (pandas DataFrame) – Dataframe columns

  • last_date – (pandas Datetime): last history date

  • periods (int) – number of future steps to predict

  • freq (str) – Data step sizes. Frequency of data recording, Any valid frequency for pd.date_range, such as ‘D’ or ‘M’

  • events_config (OrderedDict) – User specified events configs

  • events_df (pd.DataFrame) – containing column ‘ds’ and ‘event’

  • regressor_config (OrderedDict) – configuration for user specified regressors,

  • regressors_df (pd.DataFrame) – containing column ‘ds’ and one column for each of the external regressors

Returns

input df with ‘ds’ extended into future, and ‘y’ set to None

Return type

df2 (pd.DataFrame)

neuralprophet.df_utils.normalize(df, data_params, local_modeling=False)

Apply data scales.

Applies data scaling factors to df using data_params.

Parameters
  • df (pd.DataFrame or list of pd.Dataframe) – with columns ‘ds’, ‘y’, (and potentially more regressors)

  • data_params (OrderedDict) – scaling values,as returned by init_data_params with ShiftScale entries containing ‘shift’ and ‘scale’ parameters

  • local_modeling (bool) – when set to true each episode from list of dataframes will be considered

  • locally (i.e. seasonality, data_params, normalization) –

Returns

pd.DataFrame or list of pd.DataFrame, normalized

Return type

df

neuralprophet.df_utils.recover_dataframes(df_joined, episodes)

Recover list of dataframes accordingly to Episodes.

Parameters
  • df_joined (pd.DataFrame) – Dataframe concatenated containing column ‘ds’, ‘y’ with training data

  • episodes – List containing the episodes from each timestamp

Returns

Original dataframe before concatenation

Return type

DF

neuralprophet.df_utils.split_df(df, n_lags, n_forecasts, valid_p=0.2, inputs_overbleed=True, local_modeling=False)

Splits timeseries df into train and validation sets.

Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.

Parameters
  • df (pd.DataFrame or list of pd.Dataframe) – data

  • n_lags (int) – identical to NeuralProhet

  • n_forecasts (int) – identical to NeuralProhet

  • valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • local_modeling (bool) – when set to true each episode from list of dataframes will be considered

  • locally (i.e. seasonality, data_params, normalization) –

Returns

training data df_val (pd.DataFrame or list of pd.Dataframe): validation data

Return type

df_train (pd.DataFrame or list of pd.Dataframe)