Core Module Documentation

class neuralprophet.df_utils.ShiftScale(shift: float = 0.0, scale: float = 1.0)
neuralprophet.df_utils.add_missing_dates_nan(df, freq)

Fills missing datetimes in ds, with NaN for all other columns

Parameters
  • df (pd.Dataframe) – with column ds datetimes

  • freq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as D or M

Returns

dataframe without date-gaps but nan-values

Return type

pd.DataFrame

neuralprophet.df_utils.check_dataframe(df, check_y=True, covariates=None, regressors=None, events=None)

Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.

Parameters
  • df (pd.DataFrame or dict) – containing column ds

  • check_y (bool) – if df must have series values set to True if training or predicting with autoregression

  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

Returns

checked dataframe

Return type

pd.DataFrame or dict

neuralprophet.df_utils.check_single_dataframe(df, check_y, covariates, regressors, events)

Performs basic data sanity checks and ordering as well as prepare dataframe for fitting or predicting.

Parameters
  • df (pd.DataFrame) – with columns ds

  • check_y (bool) – if df must have series values (True if training or predicting with autoregression)

  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

Returns

Return type

pd.DataFrame

neuralprophet.df_utils.compare_dict_keys(dict_1, dict_2, name_dict_1, name_dict_2)

Compare keys of two different dicts (i.e., events and dataframes).

Parameters
  • dict_1 (dict) – first dict

  • dict_2 (dict) – second dict

  • name_dict_1 (str) – name of first dict

  • name_dict_2 (str) – name of second dict

neuralprophet.df_utils.convert_events_to_features(df, events_config, events_df)

Converts events information into binary features of the df

Parameters
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • events_config (OrderedDict) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

Returns

input df with columns for user_specified features

Return type

pd.DataFrame

neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)

Convert numeric frequencies into frequency tags

Parameters
  • freq_num (int) – numeric values of delta in ms

  • initial_time_stamp (str) – initial time stamp of data

Returns

frequency tag

Return type

str

neuralprophet.df_utils.convert_str_to_num_freq(freq_str)

Convert frequency tags into numeric delta in ms

Parameters

str (freq_str) – frequency tag

Returns

frequency numeric delta in ms

Return type

numeric

neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0)

Splits data in k folds for crossvalidation.

Parameters
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • k (int) – number of CV folds

  • fold_pct (float) – percentage of overall samples to be in each fold

  • fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)

Returns

training data

validation data

Return type

list of k tuples [(df_train, df_val), …]

neuralprophet.df_utils.data_params_definition(df, normalize, covariates_config=None, regressor_config=None, events_config=None)

Initialize data scaling values.

Note

We do a z normalization on the target series y, unlike OG Prophet, which does shift by min and scale by max.

Parameters
  • df (pd.DataFrame) – Time series to compute normalization parameters from.

  • normalize (bool) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • covariates_config (OrderedDict) – extra regressors with sub_parameters

  • normalize – data normalization

  • regressor_config (OrderedDict) – extra regressors (with known future values) with sub_parameters normalize (bool)

  • events_config (OrderedDict) – user specified events configs

Returns

scaling values with ShiftScale entries containing shift and scale parameters.

Return type

OrderedDict

neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)

Splits data in two sets of k folds for crossvalidation on validation and test data.

Parameters
  • (pd.DataFrame) (df) –

  • (int) (k) –

  • (int)

  • (int)

  • (float) (test_pct) –

  • (float)

Returns

elements same as crossvalidation_split_df() returns

Return type

tuple of k tuples [(folds_val, folds_test), …]

neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)

Adds missing dates, fills missing values with linear imputation or trend.

Parameters
  • series (pd.Series) – series with nan to be filled in.

  • limit_linear (int) –

    maximum number of missing values to impute.

    Note

    because imputation is done in both directions, this value is effectively doubled.

  • rolling (int) –

    maximal number of missing values to impute.

    Note

    window width is rolling + 2*limit_linear

Returns

manipulated dataframe containing filled values

Return type

pd.DataFrame

neuralprophet.df_utils.find_time_threshold(df_dict, n_lags, valid_p, inputs_overbleed)

Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.

Parameters
  • df_dict (dict) – dict of data

  • n_lags (int) – identical to NeuralProphet

  • valid_p (float) – fraction (0,1) of data to use for holdout validation set

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

Returns

time stamp threshold defines the boundary for the train and validation sets split.

Return type

str

neuralprophet.df_utils.get_dist_considering_two_freqs(dist)

Add occasions of the two most common frequencies

Note

Useful for the frequency exceptions (i.e. M, Y, Q, B, and BH).

Parameters

dist (list) – list of occasions of frequencies

Returns

sum of the two most common frequencies occasions

Return type

numeric

neuralprophet.df_utils.get_freq_dist(ds_col)

Get frequency distribution of ds column.

Parameters

ds_col (pd.DataFrame) – ds column of dataframe

Returns

numeric delta values (ms) and distribution of frequency counts

Return type

tuple

neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)

Automatically infers frequency of dataframe or dict of dataframes.

Parameters
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • freq (str) –

    Data step sizes, i.e. frequency of data recording,

    Note

    Any valid frequency for pd.date_range, such as 5min, D, MS or auto (default) to automatically set frequency.

  • n_lags (int) – identical to NeuralProphet

  • min_freq_percentage (float) – threshold for defining major frequency of data (default: 0.7

Returns

Valid frequency tag according to major frequency.

Return type

str

neuralprophet.df_utils.init_data_params(df_dict, normalize='auto', covariates_config=None, regressor_config=None, events_config=None, global_normalization=False, global_time_normalization=False)

Initialize data scaling values.

Note

We compute and store local and global normalization parameters independent of settings.

Parameters
  • df (dict) – dict of DataFrames to compute normalization parameters from.

  • normalize (str) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • covariates_config (OrderedDict) – extra regressors with sub_parameters

  • regressor_config (OrderedDict)) – extra regressors (with known future values)

  • events_config (OrderedDict) – user specified events configs

  • global_normalization (bool) –

    True: sets global modeling training with global normalization

    False: sets global modeling training with local normalization

  • global_time_normalization (bool) –

    True: normalize time globally across all time series

    False: normalize time locally for each time series

    (only valid in case of global modeling - local normalization)

Returns

  • OrderedDict – nested dict with data_params for each dataset where each contains

  • OrderedDict – ShiftScale entries containing shift and scale parameters for each column

neuralprophet.df_utils.join_dataframes(df_dict)

Join dict of dataframes preserving the episodes so it can be recovered later.

Parameters

df_dict (dict of pd.DataFrame) – containing column ds, y with training data

Returns

  • pd.Dataframe – Dataframe with concatenated episodes

  • list – keys of each timestamp

neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, events_config=None, events_df=None, regressor_config=None, regressors_df=None)

Extends df periods number steps into future.

Parameters
  • df_columns (pd.DataFrame) – Dataframe columns

  • last_date (pd.Datetime) – last history date

  • periods (int) – number of future steps to predict

  • freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as D or M

  • events_config (OrderedDict) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

  • regressor_config (OrderedDict) – configuration for user specified regressors,

  • regressors_df (pd.DataFrame) – containing column ds and one column for each of the external regressors

Returns

input df with ds extended into future, and y set to None

Return type

pd.DataFrame

neuralprophet.df_utils.maybe_get_single_df_from_df_dict(df_dict, received_unnamed_df=True)

Extract dataframe from single length dict if placeholder-named.

Parameters
  • df_dict (dict) – dict with potentially single pd.DataFrame

  • received_unnamed_df (bool) – whether the input was unnamed

Returns

original input format

Return type

pd.Dataframe or dict

neuralprophet.df_utils.normalize(df, data_params)

Applies data scaling factors to df using data_params.

Parameters
  • df (pd.DataFrame) – with columns ds, y, (and potentially more regressors)

  • data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing shift and scale parameters

Returns

normalized dataframes

Return type

pd.DataFrame

neuralprophet.df_utils.prep_copy_df_dict(df)

Creates or copy a df_dict based on the df input. It either converts a pd.DataFrame to a dict or copies it in case of a dict input.

Parameters

df (pd.DataFrame,dict) – containing df or dict with group of dfs

Returns

  • pd.DataFrames – dict of dataframes or copy of dict of dataframes

  • bool – whether the input was unnamed

neuralprophet.df_utils.recover_dataframes(df_joined, episodes)

Recover dict of dataframes accordingly to Episodes.

Parameters
  • df_joined (pd.DataFrame) – Dataframe concatenated containing column ds, y with training data

  • episodes (List) – containing the episodes from each timestamp

Returns

Original dict before concatenation

Return type

pd.Dataframe

neuralprophet.df_utils.split_considering_timestamp(df_dict, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)

Splits timeseries into train and validation sets according to given threshold_time_stamp.

Parameters
  • df_dict (dict) – dataframe or dict of dataframes containing column ds, y with all data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • threshold_time_stamp (str) – time stamp boundary that defines splitting of data

Returns

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data

neuralprophet.df_utils.split_df(df, n_lags, n_forecasts, valid_p=0.2, inputs_overbleed=True, local_split=False)

Splits timeseries df into train and validation sets.

Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.

Parameters
  • df_dict (dict) – dataframe or dict of dataframes containing column ds, y with all data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally

Returns

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data