Core Module Documentation#

class neuralprophet.df_utils.ShiftScale(shift: 'float' = 0.0, scale: 'float' = 1.0)#
neuralprophet.df_utils.add_missing_dates_nan(df, freq)#

Fills missing datetimes in ds, with NaN for all other columns

Parameters:
  • df (pd.Dataframe) – with column ds datetimes

  • freq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as D or M

Returns:

dataframe without date-gaps but nan-values

Return type:

pd.DataFrame

neuralprophet.df_utils.check_dataframe(df, check_y=True, covariates=None, regressors=None, events=None)#

Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.

Parameters:
  • df (pd.DataFrame) – containing column ds

  • check_y (bool) – if df must have series values set to True if training or predicting with autoregression

  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

Returns:

checked dataframe

Return type:

pd.DataFrame or dict

neuralprophet.df_utils.check_single_dataframe(df, check_y, covariates, regressors, events)#

Performs basic data sanity checks and ordering as well as prepare dataframe for fitting or predicting.

Parameters:
  • df (pd.DataFrame) – with columns ds

  • check_y (bool) – if df must have series values (True if training or predicting with autoregression)

  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

Return type:

pd.DataFrame

neuralprophet.df_utils.convert_events_to_features(df, config_events, events_df)#

Converts events information into binary features of the df

Parameters:
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • config_events (configure.ConfigEvents) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

Returns:

input df with columns for user_specified features

Return type:

pd.DataFrame

neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)#

Convert numeric frequencies into frequency tags

Parameters:
  • freq_num (int) – numeric values of delta in ms

  • initial_time_stamp (str) – initial time stamp of data

Returns:

frequency tag

Return type:

str

neuralprophet.df_utils.convert_str_to_num_freq(freq_str)#

Convert frequency tags into numeric delta in ms

Parameters:

str (freq_str) – frequency tag

Returns:

frequency numeric delta in ms

Return type:

numeric

neuralprophet.df_utils.create_dict_for_events_or_regressors(df, other_df, other_df_name)#

Create a dict for events or regressors according to input df.

Parameters:
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • other_df (pd.DataFrame) – Dataframe with events or regressors

  • other_df_name (str) – Definition of other_df (i.e. ‘events’, ‘regressors’)

Returns:

dictionary with events or regressors

Return type:

dict

neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0, global_model_cv_type='global-time')#

Splits data in k folds for crossvalidation.

Parameters:
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • k (int) – number of CV folds

  • fold_pct (float) – percentage of overall samples to be in each fold

  • fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)

  • global_model_cv_type (str) –

    Type of crossvalidation to apply to the time series.

    options:

    global-time (default) crossvalidation is performed according to a time stamp threshold.

    local each episode will be crossvalidated locally (may cause time leakage among different episodes)

    intersect only the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.

Returns:

training data

validation data

Return type:

list of k tuples [(df_train, df_val), …]

neuralprophet.df_utils.data_params_definition(df, normalize, config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors=None, config_events=None)#

Initialize data scaling values.

Note

We do a z normalization on the target series y, unlike OG Prophet, which does shift by min and scale by max.

Parameters:
  • df (pd.DataFrame) – Time series to compute normalization parameters from.

  • normalize (bool) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • normalize – data normalization

  • config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) with sub_parameters normalize (bool)

  • config_events (configure.ConfigEvents) – user specified events configs

Returns:

scaling values with ShiftScale entries containing shift and scale parameters.

Return type:

OrderedDict

neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)#

Splits data in two sets of k folds for crossvalidation on validation and test data.

Parameters:
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • k (int) – number of CV folds

  • valid_pct (float) – percentage of overall samples to be in validation

  • test_pct (float) – percentage of overall samples to be in test

Returns:

elements same as crossvalidation_split_df() returns

Return type:

tuple of k tuples [(folds_val, folds_test), …]

neuralprophet.df_utils.drop_missing_from_df(df, drop_missing, predict_steps, n_lags)#

Drops windows of missing values in df according to the (lagged) samples that are dropped from TimeDataset.

Parameters:
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • drop_missing (bool) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • n_lags (int) – identical to NeuralProphet

Returns:

dataframe with dropped NaN windows

Return type:

pd.DataFrame

neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)#

Adds missing dates, fills missing values with linear imputation or trend.

Parameters:
  • series (pd.Series) – series with nan to be filled in.

  • limit_linear (int) –

    maximum number of missing values to impute.

    Note

    because imputation is done in both directions, this value is effectively doubled.

  • rolling (int) –

    maximal number of missing values to impute.

    Note

    window width is rolling + 2*limit_linear

Returns:

manipulated dataframe containing filled values

Return type:

pd.DataFrame

neuralprophet.df_utils.find_time_threshold(df, n_lags, n_forecasts, valid_p, inputs_overbleed)#

Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.

Parameters:
  • df (pd.DataFrame) – data with column ds, y, and ID

  • n_lags (int) – identical to NeuralProphet

  • valid_p (float) – fraction (0,1) of data to use for holdout validation set

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

Returns:

time stamp threshold defines the boundary for the train and validation sets split.

Return type:

str

neuralprophet.df_utils.find_valid_time_interval_for_cv(df)#

Find time interval of interception among all the time series from dict.

Parameters:

df (pd.DataFrame) – data with column ds, y, and ID

Returns:

  • str – time interval start

  • str – time interval end

neuralprophet.df_utils.get_dist_considering_two_freqs(dist)#

Add occasions of the two most common frequencies

Note

Useful for the frequency exceptions (i.e. M, Y, Q, B, and BH).

Parameters:

dist (list) – list of occasions of frequencies

Returns:

sum of the two most common frequencies occasions

Return type:

numeric

neuralprophet.df_utils.get_freq_dist(ds_col)#

Get frequency distribution of ds column.

Parameters:

ds_col (pd.DataFrame) – ds column of dataframe

Returns:

numeric delta values (ms) and distribution of frequency counts

Return type:

tuple

neuralprophet.df_utils.get_max_num_lags(config_lagged_regressors: Optional[ConfigLaggedRegressors], n_lags)#

Get the greatest number of lags between the autoregression lags and the covariates lags.

Parameters:
  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • n_lags (int) – number of lagged values of series to include as model inputs

Returns:

Maximum number of lags between the autoregression lags and the covariates lags.

Return type:

int

neuralprophet.df_utils.handle_negative_values(df, col, handle_negatives)#

Handles negative values in a column according to the handle_negatives parameter.

Parameters:
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • col (str) – name of the regressor column

  • handle_negatives (str, int, float) –

    specified handling of negative values in the regressor column. Can be one of the following options:

    Options
    • remove: Remove all negative values of the regressor.

    • error: Raise an error in case of a negative value.

    • float or int: Replace negative values with the provided value.

    • (default) None: Do not handle negative values.

Returns:

dataframe with handled negative values

Return type:

pd.DataFrame

neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)#

Automatically infers frequency of dataframe.

Parameters:
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values, and optionally``ID``

  • freq (str) –

    Data step sizes, i.e. frequency of data recording,

    Note

    Any valid frequency for pd.date_range, such as 5min, D, MS or auto (default) to automatically set frequency.

  • n_lags (int) – identical to NeuralProphet

  • min_freq_percentage (float) – threshold for defining major frequency of data (default: 0.7

Returns:

Valid frequency tag according to major frequency.

Return type:

str

neuralprophet.df_utils.init_data_params(df, normalize='auto', config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors=None, config_events=None, global_normalization=False, global_time_normalization=False)#

Initialize data scaling values.

Note

We compute and store local and global normalization parameters independent of settings.

Parameters:
  • df (pd.DataFrame) – data to compute normalization parameters from.

  • normalize (str) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values)

  • config_events (configure.ConfigEvents) – user specified events configs

  • global_normalization (bool) –

    True: sets global modeling training with global normalization

    False: sets global modeling training with local normalization

  • global_time_normalization (bool) –

    True: normalize time globally across all time series

    False: normalize time locally for each time series

    (only valid in case of global modeling - local normalization)

Returns:

  • OrderedDict – nested dict with data_params for each dataset where each contains

  • OrderedDict – ShiftScale entries containing shift and scale parameters for each column

neuralprophet.df_utils.join_dfs_after_data_drop(predicted, df, merge=False)#

Creates the intersection between df and predicted, removing any dates that have been imputed and dropped in NeuralProphet.predict().

Parameters:
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • predicted (pd.DataFrame) – output dataframe of NeuralProphet.predict.

  • merge (bool) – whether to merge predicted and df into one dataframe. Options * (default) False: Returns separate dataframes * True: Merges predicted and df into one dataframe

Returns:

dataframe with dates removed, that have been imputed and dropped

Return type:

pd.DataFrame

neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, config_events=None, events_df=None, config_regressors=None, regressors_df=None)#

Extends df periods number steps into future.

Parameters:
  • df_columns (pd.DataFrame) – Dataframe columns

  • last_date (pd.Datetime) – last history date

  • periods (int) – number of future steps to predict

  • freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as D or M

  • config_events (configure.ConfigEvents) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

  • config_regressors (configure.ConfigFutureRegressors) – configuration for user specified regressors,

  • regressors_df (pd.DataFrame) – containing column ds and one column for each of the external regressors

Returns:

input df with ds extended into future, and y set to None

Return type:

pd.DataFrame

neuralprophet.df_utils.merge_dataframes(df)#

Join dataframes for procedures such as splitting data, set auto seasonalities, and others.

Parameters:

df (pd.DataFrame) – containing column ds, y, and ID with data

Returns:

Dataframe with concatenated time series (sorted ‘ds’, duplicates removed, index reset)

Return type:

pd.Dataframe

neuralprophet.df_utils.normalize(df, data_params)#

Applies data scaling factors to df using data_params.

Parameters:
  • df (pd.DataFrame) – with columns ds, y, (and potentially more regressors)

  • data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing shift and scale parameters

Returns:

normalized dataframes

Return type:

pd.DataFrame

neuralprophet.df_utils.prep_or_copy_df(df)#

Copy df if it contains the ID column. Creates ID column with ‘__df__’ if it is a df with a single time series. Converts a dict to the right df format (it will be deprecated soon). :param df: df or dict containing data :type df: pd.DataFrame, dict (deprecated)

Returns:

  • pd.DataFrames – df with ID col

  • bool – whether the ID col was present

  • bool – wheter it is a single time series

  • bool – wheter a dict was received

neuralprophet.df_utils.return_df_in_original_format(df, received_ID_col=False, received_single_time_series=True, received_dict=False)#

Return dataframe in the original format.

Parameters:
  • df (pd.DataFrame) – df with data

  • received_ID_col (bool) – whether the ID col was present

  • received_single_time_series (bool) – wheter it is a single time series

  • received_dict (bool) – wheter data originated from a dict

Returns:

original input format

Return type:

pd.Dataframe, dict (deprecated)

neuralprophet.df_utils.split_considering_timestamp(df, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)#

Splits timeseries into train and validation sets according to given threshold_time_stamp.

Parameters:
  • df (pd.DataFrame) – data with column ds, y, and ID

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • threshold_time_stamp (str) – time stamp boundary that defines splitting of data

Returns:

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data

neuralprophet.df_utils.split_df(df, n_lags, n_forecasts, valid_p=0.2, inputs_overbleed=True, local_split=False)#

Splits timeseries df into train and validation sets.

Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.

Parameters:
  • df (pd.DataFrame) – dataframe containing column ds, y, and optionally``ID`` with all data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally

Returns:

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data

neuralprophet.df_utils.unfold_dict_of_folds(folds_dict, k)#

Convert dict of folds for typical format of folding of train and test data.

Parameters:
  • folds_dict (dict) – dict of folds

  • k (int) – number of folds initially set

Returns:

training data

validation data

Return type:

list of k tuples [(df_train, df_val), …]