Core Module Documentation¶
- class neuralprophet.df_utils.ShiftScale(shift: float = 0.0, scale: float = 1.0)¶
- neuralprophet.df_utils.add_missing_dates_nan(df, freq)¶
Fills missing datetimes in
ds
, with NaN for all other columns- Parameters
df (pd.Dataframe) – with column
ds
datetimesfreq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as
D
orM
- Returns
dataframe without date-gaps but nan-values
- Return type
pd.DataFrame
- neuralprophet.df_utils.check_dataframe(df, check_y=True, covariates=None, regressors=None, events=None)¶
Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.
- Parameters
df (pd.DataFrame or dict) – containing column
ds
check_y (bool) – if df must have series values set to True if training or predicting with autoregression
covariates (list or dict) – covariate column names
regressors (list or dict) – regressor column names
events (list or dict) – event column names
- Returns
checked dataframe
- Return type
pd.DataFrame or dict
- neuralprophet.df_utils.check_single_dataframe(df, check_y, covariates, regressors, events)¶
Performs basic data sanity checks and ordering as well as prepare dataframe for fitting or predicting.
- Parameters
df (pd.DataFrame) – with columns ds
check_y (bool) – if df must have series values (
True
if training or predicting with autoregression)covariates (list or dict) – covariate column names
regressors (list or dict) – regressor column names
events (list or dict) – event column names
- Returns
- Return type
pd.DataFrame
- neuralprophet.df_utils.compare_dict_keys(dict_1, dict_2, name_dict_1, name_dict_2)¶
Compare keys of two different dicts (i.e., events and dataframes).
- Parameters
dict_1 (dict) – first dict
dict_2 (dict) – second dict
name_dict_1 (str) – name of first dict
name_dict_2 (str) – name of second dict
- neuralprophet.df_utils.convert_events_to_features(df, events_config, events_df)¶
Converts events information into binary features of the df
- Parameters
df (pd.DataFrame) – Dataframe with columns
ds
datestamps andy
time series valuesevents_config (OrderedDict) – User specified events configs
events_df (pd.DataFrame) – containing column
ds
andevent
- Returns
input df with columns for user_specified features
- Return type
pd.DataFrame
- neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)¶
Convert numeric frequencies into frequency tags
- Parameters
freq_num (int) – numeric values of delta in ms
initial_time_stamp (str) – initial time stamp of data
- Returns
frequency tag
- Return type
str
- neuralprophet.df_utils.convert_str_to_num_freq(freq_str)¶
Convert frequency tags into numeric delta in ms
- Parameters
str (freq_str) – frequency tag
- Returns
frequency numeric delta in ms
- Return type
numeric
- neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0)¶
Splits data in k folds for crossvalidation.
- Parameters
df (pd.DataFrame) – data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
k (int) – number of CV folds
fold_pct (float) – percentage of overall samples to be in each fold
fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)
- Returns
training data
validation data
- Return type
list of k tuples [(df_train, df_val), …]
- neuralprophet.df_utils.data_params_definition(df, normalize, covariates_config=None, regressor_config=None, events_config=None)¶
Initialize data scaling values.
Note
We do a z normalization on the target series
y
, unlike OG Prophet, which does shift by min and scale by max.- Parameters
df (pd.DataFrame) – Time series to compute normalization parameters from.
normalize (bool) –
Type of normalization to apply to the time series.
options:
soft
(default), unless the time series is binary, in which caseminmax
is applied.off
bypasses data normalizationminmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviationsoft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9covariates_config (OrderedDict) – extra regressors with sub_parameters
normalize – data normalization
regressor_config (OrderedDict) – extra regressors (with known future values) with sub_parameters normalize (bool)
events_config (OrderedDict) – user specified events configs
- Returns
scaling values with ShiftScale entries containing
shift
andscale
parameters.- Return type
OrderedDict
- neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)¶
Splits data in two sets of k folds for crossvalidation on validation and test data.
- Parameters
(pd.DataFrame) (df) –
(int) (k) –
(int) –
(int) –
(float) (test_pct) –
(float) –
- Returns
elements same as
crossvalidation_split_df()
returns- Return type
tuple of k tuples [(folds_val, folds_test), …]
- neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)¶
Adds missing dates, fills missing values with linear imputation or trend.
- Parameters
series (pd.Series) – series with nan to be filled in.
limit_linear (int) –
maximum number of missing values to impute.
Note
because imputation is done in both directions, this value is effectively doubled.
rolling (int) –
maximal number of missing values to impute.
Note
window width is rolling + 2*limit_linear
- Returns
manipulated dataframe containing filled values
- Return type
pd.DataFrame
- neuralprophet.df_utils.find_time_threshold(df_dict, n_lags, valid_p, inputs_overbleed)¶
Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.
- Parameters
df_dict (dict) – dict of data
n_lags (int) – identical to NeuralProphet
valid_p (float) – fraction (0,1) of data to use for holdout validation set
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
- Returns
time stamp threshold defines the boundary for the train and validation sets split.
- Return type
str
- neuralprophet.df_utils.get_dist_considering_two_freqs(dist)¶
Add occasions of the two most common frequencies
Note
Useful for the frequency exceptions (i.e.
M
,Y
,Q
,B
, andBH
).- Parameters
dist (list) – list of occasions of frequencies
- Returns
sum of the two most common frequencies occasions
- Return type
numeric
- neuralprophet.df_utils.get_freq_dist(ds_col)¶
Get frequency distribution of
ds
column.- Parameters
ds_col (pd.DataFrame) –
ds
column of dataframe- Returns
numeric delta values (
ms
) and distribution of frequency counts- Return type
tuple
- neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)¶
Automatically infers frequency of dataframe or dict of dataframes.
- Parameters
df (pd.DataFrame) – Dataframe with columns
ds
datestamps andy
time series valuesfreq (str) –
Data step sizes, i.e. frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.n_lags (int) – identical to NeuralProphet
min_freq_percentage (float) – threshold for defining major frequency of data (default:
0.7
- Returns
Valid frequency tag according to major frequency.
- Return type
str
- neuralprophet.df_utils.init_data_params(df_dict, normalize='auto', covariates_config=None, regressor_config=None, events_config=None, global_normalization=False, global_time_normalization=False)¶
Initialize data scaling values.
Note
We compute and store local and global normalization parameters independent of settings.
- Parameters
df (dict) – dict of DataFrames to compute normalization parameters from.
normalize (str) –
Type of normalization to apply to the time series.
options:
soft
(default), unless the time series is binary, in which caseminmax
is applied.off
bypasses data normalizationminmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviationsoft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9covariates_config (OrderedDict) – extra regressors with sub_parameters
regressor_config (OrderedDict)) – extra regressors (with known future values)
events_config (OrderedDict) – user specified events configs
global_normalization (bool) –
True
: sets global modeling training with global normalizationFalse
: sets global modeling training with local normalizationglobal_time_normalization (bool) –
True
: normalize time globally across all time seriesFalse
: normalize time locally for each time series(only valid in case of global modeling - local normalization)
- Returns
OrderedDict – nested dict with data_params for each dataset where each contains
OrderedDict – ShiftScale entries containing
shift
andscale
parameters for each column
- neuralprophet.df_utils.join_dataframes(df_dict)¶
Join dict of dataframes preserving the episodes so it can be recovered later.
- Parameters
df_dict (dict of pd.DataFrame) – containing column
ds
,y
with training data- Returns
pd.Dataframe – Dataframe with concatenated episodes
list – keys of each timestamp
- neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, events_config=None, events_df=None, regressor_config=None, regressors_df=None)¶
Extends df periods number steps into future.
- Parameters
df_columns (pd.DataFrame) – Dataframe columns
last_date (pd.Datetime) – last history date
periods (int) – number of future steps to predict
freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as
D
orM
events_config (OrderedDict) – User specified events configs
events_df (pd.DataFrame) – containing column
ds
andevent
regressor_config (OrderedDict) – configuration for user specified regressors,
regressors_df (pd.DataFrame) – containing column
ds
and one column for each of the external regressors
- Returns
input df with
ds
extended into future, andy
set to None- Return type
pd.DataFrame
- neuralprophet.df_utils.maybe_get_single_df_from_df_dict(df_dict, received_unnamed_df=True)¶
Extract dataframe from single length dict if placeholder-named.
- Parameters
df_dict (dict) – dict with potentially single pd.DataFrame
received_unnamed_df (bool) – whether the input was unnamed
- Returns
original input format
- Return type
pd.Dataframe or dict
- neuralprophet.df_utils.normalize(df, data_params)¶
Applies data scaling factors to df using data_params.
- Parameters
df (pd.DataFrame) – with columns
ds
,y
, (and potentially more regressors)data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing
shift
andscale
parameters
- Returns
normalized dataframes
- Return type
pd.DataFrame
- neuralprophet.df_utils.prep_copy_df_dict(df)¶
Creates or copy a df_dict based on the df input. It either converts a pd.DataFrame to a dict or copies it in case of a dict input.
- Parameters
df (pd.DataFrame,dict) – containing df or dict with group of dfs
- Returns
pd.DataFrames – dict of dataframes or copy of dict of dataframes
bool – whether the input was unnamed
- neuralprophet.df_utils.recover_dataframes(df_joined, episodes)¶
Recover dict of dataframes accordingly to Episodes.
- Parameters
df_joined (pd.DataFrame) – Dataframe concatenated containing column
ds
,y
with training dataepisodes (List) – containing the episodes from each timestamp
- Returns
Original dict before concatenation
- Return type
pd.Dataframe
- neuralprophet.df_utils.split_considering_timestamp(df_dict, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)¶
Splits timeseries into train and validation sets according to given threshold_time_stamp.
- Parameters
df_dict (dict) – dataframe or dict of dataframes containing column
ds
,y
with all datan_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
threshold_time_stamp (str) – time stamp boundary that defines splitting of data
- Returns
pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data
- neuralprophet.df_utils.split_df(df, n_lags, n_forecasts, valid_p=0.2, inputs_overbleed=True, local_split=False)¶
Splits timeseries df into train and validation sets.
Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.
- Parameters
df_dict (dict) – dataframe or dict of dataframes containing column
ds
,y
with all datan_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally
- Returns
pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data