Core Module Documentation#

class neuralprophet.df_utils.ShiftScale(shift: 'float' = 0.0, scale: 'float' = 1.0)#
neuralprophet.df_utils.add_missing_dates_nan(df, freq)#

Fills missing datetimes in ds, with NaN for all other columns except ID.

Parameters
  • df (pd.Dataframe) – with column ds datetimes

  • freq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as D or M

Returns

dataframe without date-gaps but nan-values

Return type

pd.DataFrame

neuralprophet.df_utils.add_quarter_condition(df: pandas.core.frame.DataFrame)#

Adds columns for conditional seasonalities to the df.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data

Returns

dataframe with added columns for conditional seasonalities

Note

Quarters correspond to northern hemisphere.

Return type

pd.DataFrame

neuralprophet.df_utils.add_weekday_condition(df: pandas.core.frame.DataFrame)#

Adds columns for conditional seasonalities to the df.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data

Returns

dataframe with added columns for conditional seasonalities

Return type

pd.DataFrame

neuralprophet.df_utils.check_dataframe(df: pandas.core.frame.DataFrame, check_y: bool = True, covariates=None, regressors=None, events=None, seasonalities=None, future: Optional[bool] = None) Tuple[pandas.core.frame.DataFrame, List, List]#

Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.

Parameters
  • df (pd.DataFrame) – containing column ds

  • check_y (bool) – if df must have series values set to True if training or predicting with autoregression

  • covariates (list or dict) – covariate column names

  • regressors (list or dict) – regressor column names

  • events (list or dict) – event column names

  • seasonalities (list or dict) – seasonalities column names

  • future (bool) – if df is a future dataframe

Returns

checked dataframe

Return type

pd.DataFrame or dict

neuralprophet.df_utils.convert_events_to_features(df, config_events: ConfigEvents, events_df)#

Converts events information into binary features of the df

Parameters
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • config_events (configure.ConfigEvents) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

Returns

input df with columns for user_specified features

Return type

pd.DataFrame

neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)#

Convert numeric frequencies into frequency tags

Parameters
  • freq_num (int) – numeric values of delta in ms

  • initial_time_stamp (str) – initial time stamp of data

Returns

frequency tag

Return type

str

neuralprophet.df_utils.convert_str_to_num_freq(freq_str)#

Convert frequency tags into numeric delta in ms

Parameters

str (freq_str) – frequency tag

Returns

frequency numeric delta in ms

Return type

numeric

neuralprophet.df_utils.create_dict_for_events_or_regressors(df: pandas.core.frame.DataFrame, other_df: Optional[pandas.core.frame.DataFrame], other_df_name: str) dict#

Create a dict for events or regressors according to input df.

Parameters
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values

  • other_df (pd.DataFrame) – Dataframe with events or regressors

  • other_df_name (str) – Definition of other_df (i.e. ‘events’, ‘regressors’)

Returns

dictionary with events or regressors

Return type

dict

neuralprophet.df_utils.create_dummy_datestamps(df, freq='S', startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0)#

Helper function to create a dummy series of datestamps for equidistant data without ds. :param df: dataframe with column ‘y’ and without column ‘ds’ :type df: pd.DataFrame :param freq: Frequency of data recording, any valid frequency for pd.date_range, such as D or M :type freq: str :param startyear: Defines the first datestamp :type startyear: int :param startmonth: Defines the first datestamp :type startmonth: int :param startday: Defines the first datestamp :type startday: int :param starthour: Defines the first datestamp :type starthour: int :param startminute: Defines the first datestamp :type startminute: int :param startsecond: Defines the first datestamp :type startsecond: int

Returns

dataframe with dummy equidistant datestamps

Return type

pd.DataFrame

Examples

Adding dummy datestamps to a dataframe without datestamps. To prepare the dataframe for training, import df_utils and insert your preferred dates.

>>> from neuralprophet import df_utils
>>> df_drop = df.drop("ds", axis=1)
>>> df_dummy = df_utils.create_dummy_datestamps(
>>> df_drop, freq="S", startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0
>>> )
neuralprophet.df_utils.create_mask_for_prediction_frequency(prediction_frequency, ds, forecast_lag)#

Creates a mask for the yhat array, to select the correct values for the prediction frequency. This method is only called in _reshape_raw_predictions_to_forecst_df within NeuralProphet.predict().

Parameters
  • prediction_frequency (dict) – identical to NeuralProphet

  • ds (pd.Series) – datestamps of the predictions

  • forecast_lag (int) – current forecast lag

Returns

mask for the yhat array

Return type

np.array

neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0, global_model_cv_type='global-time')#

Splits data in k folds for crossvalidation.

Parameters
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • k (int) – number of CV folds

  • fold_pct (float) – percentage of overall samples to be in each fold

  • fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)

  • global_model_cv_type (str) –

    Type of crossvalidation to apply to the time series.

    options:

    global-time (default) crossvalidation is performed according to a time stamp threshold.

    local each episode will be crossvalidated locally (may cause time leakage among different episodes)

    intersect only the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.

Returns

training data

validation data

Return type

list of k tuples [(df_train, df_val), …]

neuralprophet.df_utils.data_params_definition(df, normalize, config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, local_run_despite_global: Optional[bool] = None)#

Initialize data scaling values.

Note

We do a z normalization on the target series y, unlike OG Prophet, which does shift by min and scale by max.

Parameters
  • df (pd.DataFrame) – Time series to compute normalization parameters from.

  • normalize (bool) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • normalize – data normalization

  • config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) with sub_parameters normalize (bool)

  • config_events (configure.ConfigEvents) – user specified events configs

  • config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs

Returns

scaling values with ShiftScale entries containing shift and scale parameters.

Return type

OrderedDict

neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)#

Splits data in two sets of k folds for crossvalidation on validation and test data.

Parameters
  • df (pd.DataFrame) – data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • k (int) – number of CV folds

  • valid_pct (float) – percentage of overall samples to be in validation

  • test_pct (float) – percentage of overall samples to be in test

Returns

elements same as crossvalidation_split_df() returns

Return type

tuple of k tuples [(folds_val, folds_test), …]

neuralprophet.df_utils.drop_missing_from_df(df, drop_missing, predict_steps, n_lags)#

Drops windows of missing values in df according to the (lagged) samples that are dropped from TimeDataset.

Parameters
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • drop_missing (bool) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • n_lags (int) – identical to NeuralProphet

Returns

dataframe with dropped NaN windows

Return type

pd.DataFrame

neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)#

Adds missing dates, fills missing values with linear imputation or trend.

Parameters
  • series (pd.Series) – series with nan to be filled in.

  • limit_linear (int) –

    maximum number of missing values to impute.

    Note

    because imputation is done in both directions, this value is effectively doubled.

  • rolling (int) –

    maximal number of missing values to impute.

    Note

    window width is rolling + 2*limit_linear

Returns

manipulated dataframe containing filled values

Return type

pd.DataFrame

neuralprophet.df_utils.find_time_threshold(df, n_lags, n_forecasts, valid_p, inputs_overbleed)#

Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.

Parameters
  • df (pd.DataFrame) – data with column ds, y, and ID

  • n_lags (int) – identical to NeuralProphet

  • valid_p (float) – fraction (0,1) of data to use for holdout validation set

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

Returns

time stamp threshold defines the boundary for the train and validation sets split.

Return type

str

neuralprophet.df_utils.find_valid_time_interval_for_cv(df)#

Find time interval of interception among all the time series from dict.

Parameters

df (pd.DataFrame) – data with column ds, y, and ID

Returns

  • str – time interval start

  • str – time interval end

neuralprophet.df_utils.get_dist_considering_two_freqs(dist)#

Add occasions of the two most common frequencies

Note

Useful for the frequency exceptions (i.e. M, Y, Q, B, and BH).

Parameters

dist (list) – list of occasions of frequencies

Returns

sum of the two most common frequencies occasions

Return type

numeric

neuralprophet.df_utils.get_freq_dist(ds_col)#

Get frequency distribution of ds column.

Parameters

ds_col (pd.DataFrame) – ds column of dataframe

Returns

numeric delta values (ms) and distribution of frequency counts

Return type

tuple

neuralprophet.df_utils.get_max_num_lags(config_lagged_regressors: Optional[ConfigLaggedRegressors], n_lags: int) int#

Get the greatest number of lags between the autoregression lags and the covariates lags.

Parameters
  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • n_lags (int) – number of lagged values of series to include as model inputs

Returns

Maximum number of lags between the autoregression lags and the covariates lags.

Return type

int

neuralprophet.df_utils.handle_negative_values(df, col, handle_negatives)#

Handles negative values in a column according to the handle_negatives parameter.

Parameters
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • col (str) – name of the regressor column

  • handle_negatives (str, int, float) –

    specified handling of negative values in the regressor column. Can be one of the following options:

    Options
    • remove: Remove all negative values of the regressor.

    • error: Raise an error in case of a negative value.

    • float or int: Replace negative values with the provided value.

    • (default) None: Do not handle negative values.

Returns

dataframe with handled negative values

Return type

pd.DataFrame

neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)#

Automatically infers frequency of dataframe.

Parameters
  • df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values, and optionally``ID``

  • freq (str) –

    Data step sizes, i.e. frequency of data recording,

    Note

    Any valid frequency for pd.date_range, such as 5min, D, MS or auto (default) to automatically set frequency.

  • n_lags (int) – identical to NeuralProphet

  • min_freq_percentage (float) – threshold for defining major frequency of data (default: 0.7

Returns

Valid frequency tag according to major frequency.

Return type

str

neuralprophet.df_utils.init_data_params(df, normalize='auto', config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, global_normalization=False, global_time_normalization=False)#

Initialize data scaling values.

Note

We compute and store local and global normalization parameters independent of settings.

Parameters
  • df (pd.DataFrame) – data to compute normalization parameters from.

  • normalize (str) –

    Type of normalization to apply to the time series.

    options:

    soft (default), unless the time series is binary, in which case minmax is applied.

    off bypasses data normalization

    minmax scales the minimum value to 0.0 and the maximum value to 1.0

    standardize zero-centers and divides by the standard deviation

    soft scales the minimum value to 0.0 and the 95th quantile to 1.0

    soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9

  • config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors

  • config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values)

  • config_events (configure.ConfigEvents) – user specified events configs

  • config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs

  • global_normalization (bool) –

    True: sets global modeling training with global normalization

    False: sets global modeling training with local normalization

  • global_time_normalization (bool) –

    True: normalize time globally across all time series

    False: normalize time locally for each time series

    (only valid in case of global modeling - local normalization)

Returns

  • OrderedDict – nested dict with data_params for each dataset where each contains

  • OrderedDict – ShiftScale entries containing shift and scale parameters for each column

neuralprophet.df_utils.join_dfs_after_data_drop(predicted, df, merge=False)#

Creates the intersection between df and predicted, removing any dates that have been imputed and dropped in NeuralProphet.predict().

Parameters
  • df (pd.DataFrame) – dataframe containing column ds, y with all data

  • predicted (pd.DataFrame) – output dataframe of NeuralProphet.predict.

  • merge (bool) – whether to merge predicted and df into one dataframe. Options * (default) False: Returns separate dataframes * True: Merges predicted and df into one dataframe

Returns

dataframe with dates removed, that have been imputed and dropped

Return type

pd.DataFrame

neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, config_events: ConfigEvents, config_regressors: ConfigFutureRegressors, events_df=None, regressors_df=None)#

Extends df periods number steps into future.

Parameters
  • df_columns (pd.DataFrame) – Dataframe columns

  • last_date (pd.Datetime) – last history date

  • periods (int) – number of future steps to predict

  • freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as D or M

  • config_events (configure.ConfigEvents) – User specified events configs

  • events_df (pd.DataFrame) – containing column ds and event

  • config_regressors (configure.ConfigFutureRegressors) – configuration for user specified regressors,

  • regressors_df (pd.DataFrame) – containing column ds and one column for each of the external regressors

Returns

input df with ds extended into future, and y set to None

Return type

pd.DataFrame

neuralprophet.df_utils.merge_dataframes(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame#

Join dataframes for procedures such as splitting data, set auto seasonalities, and others.

Parameters

df (pd.DataFrame) – containing column ds, y, and ID with data

Returns

Dataframe with concatenated time series (sorted ‘ds’, duplicates removed, index reset)

Return type

pd.Dataframe

neuralprophet.df_utils.normalize(df, data_params)#

Applies data scaling factors to df using data_params.

Parameters
  • df (pd.DataFrame) – with columns ds, y, (and potentially more regressors)

  • data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing shift and scale parameters

Returns

normalized dataframes

Return type

pd.DataFrame

neuralprophet.df_utils.prep_or_copy_df(df: pandas.core.frame.DataFrame) tuple[pandas.core.frame.DataFrame, bool, bool, list[str]]#

Copy df if it contains the ID column. Creates ID column with ‘__df__’ if it is a df with a single time series. :param df: df or dict containing data :type df: pd.DataFrame

Returns

  • pd.DataFrames – df with ID col

  • bool – whether the ID col was present

  • bool – wheter it is a single time series

  • list – list of IDs

neuralprophet.df_utils.return_df_in_original_format(df, received_ID_col=False, received_single_time_series=True)#

Return dataframe in the original format.

Parameters
  • df (pd.DataFrame) – df with data

  • received_ID_col (bool) – whether the ID col was present

  • received_single_time_series (bool) – wheter it is a single time series

Returns

original input format

Return type

pd.Dataframe

neuralprophet.df_utils.split_considering_timestamp(df, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)#

Splits timeseries into train and validation sets according to given threshold_time_stamp.

Parameters
  • df (pd.DataFrame) – data with column ds, y, and ID

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • threshold_time_stamp (str) – time stamp boundary that defines splitting of data

Returns

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data

neuralprophet.df_utils.split_df(df: pandas.core.frame.DataFrame, n_lags: int, n_forecasts: int, valid_p: float = 0.2, inputs_overbleed: bool = True, local_split: bool = False)#

Splits timeseries df into train and validation sets.

Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.

Parameters
  • df (pd.DataFrame) – dataframe containing column ds, y, and optionally``ID`` with all data

  • n_lags (int) – identical to NeuralProphet

  • n_forecasts (int) – identical to NeuralProphet

  • valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1

  • inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

  • local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally

Returns

  • pd.DataFrame, dict – training data

  • pd.DataFrame, dict – validation data

neuralprophet.df_utils.unfold_dict_of_folds(folds_dict, k)#

Convert dict of folds for typical format of folding of train and test data.

Parameters
  • folds_dict (dict) – dict of folds

  • k (int) – number of folds initially set

Returns

training data

validation data

Return type

list of k tuples [(df_train, df_val), …]