Core Module Documentation#

class neuralprophet.df_utils.ShiftScale(shift: 'float' = 0.0, scale: 'float' = 1.0)#

neuralprophet.df_utils.add_missing_dates_nan(df, freq)#

Fills missing datetimes in ds, with NaN for all other columns except ID.

Parameters

df (pd.Dataframe) – with column ds datetimes
freq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as D or M

Returns

dataframe without date-gaps but nan-values

Return type

pd.DataFrame

neuralprophet.df_utils.add_quarter_condition(df: pandas.core.frame.DataFrame)#

Adds columns for conditional seasonalities to the df.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data

Returns

dataframe with added columns for conditional seasonalities

Note

Quarters correspond to northern hemisphere.

Return type

pd.DataFrame

neuralprophet.df_utils.add_weekday_condition(df: pandas.core.frame.DataFrame)#

Adds columns for conditional seasonalities to the df.

Parameters: df (pd.DataFrame) – dataframe containing column ds, y with all data
Returns: dataframe with added columns for conditional seasonalities
Return type: pd.DataFrame

neuralprophet.df_utils.check_dataframe(df: pandas.core.frame.DataFrame, check_y: bool = True, covariates=None, regressors=None, events=None, seasonalities=None, future: Optional[bool] = None) → Tuple[pandas.core.frame.DataFrame, List, List]#

Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.

Parameters

df (pd.DataFrame) – containing column ds
check_y (bool) – if df must have series values set to True if training or predicting with autoregression
covariates (list or dict) – covariate column names
regressors (list or dict) – regressor column names
events (list or dict) – event column names
seasonalities (list or dict) – seasonalities column names
future (bool) – if df is a future dataframe

Returns

checked dataframe

Return type

pd.DataFrame or dict

neuralprophet.df_utils.convert_events_to_features(df, config_events: ConfigEvents, events_df)#

Converts events information into binary features of the df

Parameters

df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values
config_events (configure.ConfigEvents) – User specified events configs
events_df (pd.DataFrame) – containing column ds and event

Returns

input df with columns for user_specified features

Return type

pd.DataFrame

neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)#

Convert numeric frequencies into frequency tags

Parameters

freq_num (int) – numeric values of delta in ms
initial_time_stamp (str) – initial time stamp of data

Returns

frequency tag

Return type

str

neuralprophet.df_utils.convert_str_to_num_freq(freq_str)#

Convert frequency tags into numeric delta in ms

Parameters: str (freq_str) – frequency tag
Returns: frequency numeric delta in ms
Return type: numeric

neuralprophet.df_utils.create_dict_for_events_or_regressors(df: pandas.core.frame.DataFrame, other_df: Optional[pandas.core.frame.DataFrame], other_df_name: str) → dict#

Create a dict for events or regressors according to input df.

Parameters

df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values
other_df (pd.DataFrame) – Dataframe with events or regressors
other_df_name (str) – Definition of other_df (i.e. ‘events’, ‘regressors’)

Returns

dictionary with events or regressors

Return type

dict

neuralprophet.df_utils.create_dummy_datestamps(df, freq='S', startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0)#

Helper function to create a dummy series of datestamps for equidistant data without ds. :param df: dataframe with column ‘y’ and without column ‘ds’ :type df: pd.DataFrame :param freq: Frequency of data recording, any valid frequency for pd.date_range, such as D or M :type freq: str :param startyear: Defines the first datestamp :type startyear: int :param startmonth: Defines the first datestamp :type startmonth: int :param startday: Defines the first datestamp :type startday: int :param starthour: Defines the first datestamp :type starthour: int :param startminute: Defines the first datestamp :type startminute: int :param startsecond: Defines the first datestamp :type startsecond: int

Returns: dataframe with dummy equidistant datestamps
Return type: pd.DataFrame

Examples

Adding dummy datestamps to a dataframe without datestamps. To prepare the dataframe for training, import df_utils and insert your preferred dates.

>>> from neuralprophet import df_utils
>>> df_drop = df.drop("ds", axis=1)
>>> df_dummy = df_utils.create_dummy_datestamps(
>>> df_drop, freq="S", startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0
>>> )

neuralprophet.df_utils.create_mask_for_prediction_frequency(prediction_frequency, ds, forecast_lag)#

Creates a mask for the yhat array, to select the correct values for the prediction frequency. This method is only called in _reshape_raw_predictions_to_forecst_df within NeuralProphet.predict().

Parameters

prediction_frequency (dict) – identical to NeuralProphet
ds (pd.Series) – datestamps of the predictions
forecast_lag (int) – current forecast lag

Returns

mask for the yhat array

Return type

np.array

neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0, global_model_cv_type='global-time')#

Splits data in k folds for crossvalidation.

Parameters

df (pd.DataFrame) – data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
k (int) – number of CV folds
fold_pct (float) – percentage of overall samples to be in each fold
fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)
global_model_cv_type (str) –
Type of crossvalidation to apply to the time series.

options:

global-time (default) crossvalidation is performed according to a time stamp threshold.

local each episode will be crossvalidated locally (may cause time leakage among different episodes)

intersect only the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.

Returns

training data

validation data

Return type

list of k tuples [(df_train, df_val), …]

neuralprophet.df_utils.data_params_definition(df, normalize, config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, local_run_despite_global: Optional[bool] = None)#

Initialize data scaling values.

Note

We do a z normalization on the target series y, unlike OG Prophet, which does shift by min and scale by max.

Parameters

df (pd.DataFrame) – Time series to compute normalization parameters from.
normalize (bool) –
Type of normalization to apply to the time series.

options:

soft (default), unless the time series is binary, in which case minmax is applied.

off bypasses data normalization

minmax scales the minimum value to 0.0 and the maximum value to 1.0

standardize zero-centers and divides by the standard deviation

soft scales the minimum value to 0.0 and the 95th quantile to 1.0

soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9
config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
normalize – data normalization
config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) with sub_parameters normalize (bool)
config_events (configure.ConfigEvents) – user specified events configs
config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs

Returns

scaling values with ShiftScale entries containing shift and scale parameters.

Return type

OrderedDict

neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)#

Splits data in two sets of k folds for crossvalidation on validation and test data.

Parameters

df (pd.DataFrame) – data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
k (int) – number of CV folds
valid_pct (float) – percentage of overall samples to be in validation
test_pct (float) – percentage of overall samples to be in test

Returns

elements same as crossvalidation_split_df() returns

Return type

tuple of k tuples [(folds_val, folds_test), …]

neuralprophet.df_utils.drop_missing_from_df(df, drop_missing, predict_steps, n_lags)#

Drops windows of missing values in df according to the (lagged) samples that are dropped from TimeDataset.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data
drop_missing (bool) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
n_lags (int) – identical to NeuralProphet

Returns

dataframe with dropped NaN windows

Return type

pd.DataFrame

neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)#

Adds missing dates, fills missing values with linear imputation or trend.

Parameters

series (pd.Series) – series with nan to be filled in.
limit_linear (int) –
maximum number of missing values to impute.

Note

because imputation is done in both directions, this value is effectively doubled.
rolling (int) –
maximal number of missing values to impute.

Note

window width is rolling + 2*limit_linear

Returns

manipulated dataframe containing filled values

Return type

pd.DataFrame

neuralprophet.df_utils.find_time_threshold(df, n_lags, n_forecasts, valid_p, inputs_overbleed)#

Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.

Parameters

df (pd.DataFrame) – data with column ds, y, and ID
n_lags (int) – identical to NeuralProphet
valid_p (float) – fraction (0,1) of data to use for holdout validation set
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)

Returns

time stamp threshold defines the boundary for the train and validation sets split.

Return type

str

neuralprophet.df_utils.find_valid_time_interval_for_cv(df)#

Find time interval of interception among all the time series from dict.

Parameters

df (pd.DataFrame) – data with column ds, y, and ID

Returns

str – time interval start
str – time interval end

neuralprophet.df_utils.get_dist_considering_two_freqs(dist)#

Add occasions of the two most common frequencies

Note

Useful for the frequency exceptions (i.e. M, Y, Q, B, and BH).

Parameters: dist (list) – list of occasions of frequencies
Returns: sum of the two most common frequencies occasions
Return type: numeric

neuralprophet.df_utils.get_freq_dist(ds_col)#

Get frequency distribution of ds column.

Parameters: ds_col (pd.DataFrame) – ds column of dataframe
Returns: numeric delta values (ms) and distribution of frequency counts
Return type: tuple

neuralprophet.df_utils.get_max_num_lags(config_lagged_regressors: Optional[ConfigLaggedRegressors], n_lags: int) → int#

Get the greatest number of lags between the autoregression lags and the covariates lags.

Parameters

config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
n_lags (int) – number of lagged values of series to include as model inputs

Returns

Maximum number of lags between the autoregression lags and the covariates lags.

Return type

int

neuralprophet.df_utils.handle_negative_values(df, col, handle_negatives)#

Handles negative values in a column according to the handle_negatives parameter.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data
col (str) – name of the regressor column
handle_negatives (str, int, float) –
specified handling of negative values in the regressor column. Can be one of the following options:
Options
- remove: Remove all negative values of the regressor.
- error: Raise an error in case of a negative value.
- float or int: Replace negative values with the provided value.
- (default) None: Do not handle negative values.

Returns

dataframe with handled negative values

Return type

pd.DataFrame

neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)#

Automatically infers frequency of dataframe.

Parameters

df (pd.DataFrame) – Dataframe with columns ds datestamps and y time series values, and optionally``ID``
freq (str) –
Data step sizes, i.e. frequency of data recording,

Note

Any valid frequency for pd.date_range, such as 5min, D, MS or auto (default) to automatically set frequency.
n_lags (int) – identical to NeuralProphet
min_freq_percentage (float) – threshold for defining major frequency of data (default: 0.7

Returns

Valid frequency tag according to major frequency.

Return type

str

neuralprophet.df_utils.init_data_params(df, normalize='auto', config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, global_normalization=False, global_time_normalization=False)#

Initialize data scaling values.

Note

We compute and store local and global normalization parameters independent of settings.

Parameters

df (pd.DataFrame) – data to compute normalization parameters from.
normalize (str) –
Type of normalization to apply to the time series.

options:

soft (default), unless the time series is binary, in which case minmax is applied.

off bypasses data normalization

minmax scales the minimum value to 0.0 and the maximum value to 1.0

standardize zero-centers and divides by the standard deviation

soft scales the minimum value to 0.0 and the 95th quantile to 1.0

soft1 scales the minimum value to 0.1 and the 90th quantile to 0.9
config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values)
config_events (configure.ConfigEvents) – user specified events configs
config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs
global_normalization (bool) –
True: sets global modeling training with global normalization

False: sets global modeling training with local normalization
global_time_normalization (bool) –
True: normalize time globally across all time series

False: normalize time locally for each time series

(only valid in case of global modeling - local normalization)

Returns

OrderedDict – nested dict with data_params for each dataset where each contains
OrderedDict – ShiftScale entries containing shift and scale parameters for each column

neuralprophet.df_utils.join_dfs_after_data_drop(predicted, df, merge=False)#

Creates the intersection between df and predicted, removing any dates that have been imputed and dropped in NeuralProphet.predict().

Parameters

df (pd.DataFrame) – dataframe containing column ds, y with all data
predicted (pd.DataFrame) – output dataframe of NeuralProphet.predict.
merge (bool) – whether to merge predicted and df into one dataframe. Options * (default) False: Returns separate dataframes * True: Merges predicted and df into one dataframe

Returns

dataframe with dates removed, that have been imputed and dropped

Return type

pd.DataFrame

neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, config_events: ConfigEvents, config_regressors: ConfigFutureRegressors, events_df=None, regressors_df=None)#

Extends df periods number steps into future.

Parameters

df_columns (pd.DataFrame) – Dataframe columns
last_date (pd.Datetime) – last history date
periods (int) – number of future steps to predict
freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as D or M
config_events (configure.ConfigEvents) – User specified events configs
events_df (pd.DataFrame) – containing column ds and event
config_regressors (configure.ConfigFutureRegressors) – configuration for user specified regressors,
regressors_df (pd.DataFrame) – containing column ds and one column for each of the external regressors

Returns

input df with ds extended into future, and y set to None

Return type

pd.DataFrame

neuralprophet.df_utils.merge_dataframes(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame#

Join dataframes for procedures such as splitting data, set auto seasonalities, and others.

Parameters: df (pd.DataFrame) – containing column ds, y, and ID with data
Returns: Dataframe with concatenated time series (sorted ‘ds’, duplicates removed, index reset)
Return type: pd.Dataframe

neuralprophet.df_utils.normalize(df, data_params)#

Applies data scaling factors to df using data_params.

Parameters

df (pd.DataFrame) – with columns ds, y, (and potentially more regressors)
data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing shift and scale parameters

Returns

normalized dataframes

Return type

pd.DataFrame

neuralprophet.df_utils.prep_or_copy_df(df: pandas.core.frame.DataFrame) → tuple[pandas.core.frame.DataFrame, bool, bool, list[str]]#

Copy df if it contains the ID column. Creates ID column with ‘__df__’ if it is a df with a single time series. :param df: df or dict containing data :type df: pd.DataFrame

Returns

pd.DataFrames – df with ID col
bool – whether the ID col was present
bool – wheter it is a single time series
list – list of IDs

neuralprophet.df_utils.return_df_in_original_format(df, received_ID_col=False, received_single_time_series=True)#

Return dataframe in the original format.

Parameters

df (pd.DataFrame) – df with data
received_ID_col (bool) – whether the ID col was present
received_single_time_series (bool) – wheter it is a single time series

Returns

original input format

Return type

pd.Dataframe

neuralprophet.df_utils.split_considering_timestamp(df, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)#

Splits timeseries into train and validation sets according to given threshold_time_stamp.

Parameters

df (pd.DataFrame) – data with column ds, y, and ID
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
threshold_time_stamp (str) – time stamp boundary that defines splitting of data

Returns

pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data

neuralprophet.df_utils.split_df(df: pandas.core.frame.DataFrame, n_lags: int, n_forecasts: int, valid_p: float = 0.2, inputs_overbleed: bool = True, local_split: bool = False)#

Splits timeseries df into train and validation sets.

Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.

Parameters

df (pd.DataFrame) – dataframe containing column ds, y, and optionally``ID`` with all data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally

Returns

pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data

neuralprophet.df_utils.unfold_dict_of_folds(folds_dict, k)#

Convert dict of folds for typical format of folding of train and test data.

Parameters

folds_dict (dict) – dict of folds
k (int) – number of folds initially set

Returns

training data

validation data

Return type

list of k tuples [(df_train, df_val), …]