Core Module Documentation#
- class neuralprophet.df_utils.ShiftScale(shift: 'float' = 0.0, scale: 'float' = 1.0)#
- neuralprophet.df_utils.add_missing_dates_nan(df, freq)#
Fills missing datetimes in
ds
, with NaN for all other columns exceptID
.- Parameters
df (pd.Dataframe) – with column
ds
datetimesfreq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as
D
orM
- Returns
dataframe without date-gaps but nan-values
- Return type
pd.DataFrame
- neuralprophet.df_utils.add_quarter_condition(df: pandas.core.frame.DataFrame)#
Adds columns for conditional seasonalities to the df.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all data- Returns
dataframe with added columns for conditional seasonalities
Note
Quarters correspond to northern hemisphere.
- Return type
pd.DataFrame
- neuralprophet.df_utils.add_weekday_condition(df: pandas.core.frame.DataFrame)#
Adds columns for conditional seasonalities to the df.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all data- Returns
dataframe with added columns for conditional seasonalities
- Return type
pd.DataFrame
- neuralprophet.df_utils.check_dataframe(df: pandas.core.frame.DataFrame, check_y: bool = True, covariates=None, regressors=None, events=None, seasonalities=None, future: Optional[bool] = None) Tuple[pandas.core.frame.DataFrame, List, List] #
Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting.
- Parameters
df (pd.DataFrame) – containing column
ds
check_y (bool) – if df must have series values set to True if training or predicting with autoregression
covariates (list or dict) – covariate column names
regressors (list or dict) – regressor column names
events (list or dict) – event column names
seasonalities (list or dict) – seasonalities column names
future (bool) – if df is a future dataframe
- Returns
checked dataframe
- Return type
pd.DataFrame or dict
- neuralprophet.df_utils.convert_events_to_features(df, config_events: ConfigEvents, events_df)#
Converts events information into binary features of the df
- Parameters
df (pd.DataFrame) – Dataframe with columns
ds
datestamps andy
time series valuesconfig_events (configure.ConfigEvents) – User specified events configs
events_df (pd.DataFrame) – containing column
ds
andevent
- Returns
input df with columns for user_specified features
- Return type
pd.DataFrame
- neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)#
Convert numeric frequencies into frequency tags
- Parameters
freq_num (int) – numeric values of delta in ms
initial_time_stamp (str) – initial time stamp of data
- Returns
frequency tag
- Return type
str
- neuralprophet.df_utils.convert_str_to_num_freq(freq_str)#
Convert frequency tags into numeric delta in ms
- Parameters
str (freq_str) – frequency tag
- Returns
frequency numeric delta in ms
- Return type
numeric
- neuralprophet.df_utils.create_dict_for_events_or_regressors(df: pandas.core.frame.DataFrame, other_df: Optional[pandas.core.frame.DataFrame], other_df_name: str) dict #
Create a dict for events or regressors according to input df.
- Parameters
df (pd.DataFrame) – Dataframe with columns
ds
datestamps andy
time series valuesother_df (pd.DataFrame) – Dataframe with events or regressors
other_df_name (str) – Definition of other_df (i.e. ‘events’, ‘regressors’)
- Returns
dictionary with events or regressors
- Return type
dict
- neuralprophet.df_utils.create_dummy_datestamps(df, freq='S', startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0)#
Helper function to create a dummy series of datestamps for equidistant data without ds. :param df: dataframe with column ‘y’ and without column ‘ds’ :type df: pd.DataFrame :param freq: Frequency of data recording, any valid frequency for pd.date_range, such as
D
orM
:type freq: str :param startyear: Defines the first datestamp :type startyear: int :param startmonth: Defines the first datestamp :type startmonth: int :param startday: Defines the first datestamp :type startday: int :param starthour: Defines the first datestamp :type starthour: int :param startminute: Defines the first datestamp :type startminute: int :param startsecond: Defines the first datestamp :type startsecond: int- Returns
dataframe with dummy equidistant datestamps
- Return type
pd.DataFrame
Examples
Adding dummy datestamps to a dataframe without datestamps. To prepare the dataframe for training, import df_utils and insert your preferred dates.
>>> from neuralprophet import df_utils >>> df_drop = df.drop("ds", axis=1) >>> df_dummy = df_utils.create_dummy_datestamps( >>> df_drop, freq="S", startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0 >>> )
- neuralprophet.df_utils.create_mask_for_prediction_frequency(prediction_frequency, ds, forecast_lag)#
Creates a mask for the yhat array, to select the correct values for the prediction frequency. This method is only called in _reshape_raw_predictions_to_forecst_df within NeuralProphet.predict().
- Parameters
prediction_frequency (dict) – identical to NeuralProphet
ds (pd.Series) – datestamps of the predictions
forecast_lag (int) – current forecast lag
- Returns
mask for the yhat array
- Return type
np.array
- neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0, global_model_cv_type='global-time')#
Splits data in k folds for crossvalidation.
- Parameters
df (pd.DataFrame) – data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
k (int) – number of CV folds
fold_pct (float) – percentage of overall samples to be in each fold
fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0)
global_model_cv_type (str) –
Type of crossvalidation to apply to the time series.
options:
global-time
(default) crossvalidation is performed according to a time stamp threshold.local
each episode will be crossvalidated locally (may cause time leakage among different episodes)intersect
only the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.
- Returns
training data
validation data
- Return type
list of k tuples [(df_train, df_val), …]
- neuralprophet.df_utils.data_params_definition(df, normalize, config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, local_run_despite_global: Optional[bool] = None)#
Initialize data scaling values.
Note
We do a z normalization on the target series
y
, unlike OG Prophet, which does shift by min and scale by max.- Parameters
df (pd.DataFrame) – Time series to compute normalization parameters from.
normalize (bool) –
Type of normalization to apply to the time series.
options:
soft
(default), unless the time series is binary, in which caseminmax
is applied.off
bypasses data normalizationminmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviationsoft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
normalize – data normalization
config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) with sub_parameters normalize (bool)
config_events (configure.ConfigEvents) – user specified events configs
config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs
- Returns
scaling values with ShiftScale entries containing
shift
andscale
parameters.- Return type
OrderedDict
- neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)#
Splits data in two sets of k folds for crossvalidation on validation and test data.
- Parameters
df (pd.DataFrame) – data
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
k (int) – number of CV folds
valid_pct (float) – percentage of overall samples to be in validation
test_pct (float) – percentage of overall samples to be in test
- Returns
elements same as
crossvalidation_split_df()
returns- Return type
tuple of k tuples [(folds_val, folds_test), …]
- neuralprophet.df_utils.drop_missing_from_df(df, drop_missing, predict_steps, n_lags)#
Drops windows of missing values in df according to the (lagged) samples that are dropped from TimeDataset.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all datadrop_missing (bool) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
n_lags (int) – identical to NeuralProphet
- Returns
dataframe with dropped NaN windows
- Return type
pd.DataFrame
- neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)#
Adds missing dates, fills missing values with linear imputation or trend.
- Parameters
series (pd.Series) – series with nan to be filled in.
limit_linear (int) –
maximum number of missing values to impute.
Note
because imputation is done in both directions, this value is effectively doubled.
rolling (int) –
maximal number of missing values to impute.
Note
window width is rolling + 2*limit_linear
- Returns
manipulated dataframe containing filled values
- Return type
pd.DataFrame
- neuralprophet.df_utils.find_time_threshold(df, n_lags, n_forecasts, valid_p, inputs_overbleed)#
Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured.
- Parameters
df (pd.DataFrame) – data with column
ds
,y
, andID
n_lags (int) – identical to NeuralProphet
valid_p (float) – fraction (0,1) of data to use for holdout validation set
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
- Returns
time stamp threshold defines the boundary for the train and validation sets split.
- Return type
str
- neuralprophet.df_utils.find_valid_time_interval_for_cv(df)#
Find time interval of interception among all the time series from dict.
- Parameters
df (pd.DataFrame) – data with column
ds
,y
, andID
- Returns
str – time interval start
str – time interval end
- neuralprophet.df_utils.get_dist_considering_two_freqs(dist)#
Add occasions of the two most common frequencies
Note
Useful for the frequency exceptions (i.e.
M
,Y
,Q
,B
, andBH
).- Parameters
dist (list) – list of occasions of frequencies
- Returns
sum of the two most common frequencies occasions
- Return type
numeric
- neuralprophet.df_utils.get_freq_dist(ds_col)#
Get frequency distribution of
ds
column.- Parameters
ds_col (pd.DataFrame) –
ds
column of dataframe- Returns
numeric delta values (
ms
) and distribution of frequency counts- Return type
tuple
- neuralprophet.df_utils.get_max_num_lags(config_lagged_regressors: Optional[ConfigLaggedRegressors], n_lags: int) int #
Get the greatest number of lags between the autoregression lags and the covariates lags.
- Parameters
config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
n_lags (int) – number of lagged values of series to include as model inputs
- Returns
Maximum number of lags between the autoregression lags and the covariates lags.
- Return type
int
- neuralprophet.df_utils.handle_negative_values(df, col, handle_negatives)#
Handles negative values in a column according to the handle_negatives parameter.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all datacol (str) – name of the regressor column
handle_negatives (str, int, float) –
specified handling of negative values in the regressor column. Can be one of the following options:
- Options
remove
: Remove all negative values of the regressor.error
: Raise an error in case of a negative value.float
orint
: Replace negative values with the provided value.(default)
None
: Do not handle negative values.
- Returns
dataframe with handled negative values
- Return type
pd.DataFrame
- neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)#
Automatically infers frequency of dataframe.
- Parameters
df (pd.DataFrame) – Dataframe with columns
ds
datestamps andy
time series values, and optionally``ID``freq (str) –
Data step sizes, i.e. frequency of data recording,
Note
Any valid frequency for pd.date_range, such as
5min
,D
,MS
orauto
(default) to automatically set frequency.n_lags (int) – identical to NeuralProphet
min_freq_percentage (float) – threshold for defining major frequency of data (default:
0.7
- Returns
Valid frequency tag according to major frequency.
- Return type
str
- neuralprophet.df_utils.init_data_params(df, normalize='auto', config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, global_normalization=False, global_time_normalization=False)#
Initialize data scaling values.
Note
We compute and store local and global normalization parameters independent of settings.
- Parameters
df (pd.DataFrame) – data to compute normalization parameters from.
normalize (str) –
Type of normalization to apply to the time series.
options:
soft
(default), unless the time series is binary, in which caseminmax
is applied.off
bypasses data normalizationminmax
scales the minimum value to 0.0 and the maximum value to 1.0standardize
zero-centers and divides by the standard deviationsoft
scales the minimum value to 0.0 and the 95th quantile to 1.0soft1
scales the minimum value to 0.1 and the 90th quantile to 0.9config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors
config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values)
config_events (configure.ConfigEvents) – user specified events configs
config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs
global_normalization (bool) –
True
: sets global modeling training with global normalizationFalse
: sets global modeling training with local normalizationglobal_time_normalization (bool) –
True
: normalize time globally across all time seriesFalse
: normalize time locally for each time series(only valid in case of global modeling - local normalization)
- Returns
OrderedDict – nested dict with data_params for each dataset where each contains
OrderedDict – ShiftScale entries containing
shift
andscale
parameters for each column
- neuralprophet.df_utils.join_dfs_after_data_drop(predicted, df, merge=False)#
Creates the intersection between df and predicted, removing any dates that have been imputed and dropped in NeuralProphet.predict().
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
with all datapredicted (pd.DataFrame) – output dataframe of NeuralProphet.predict.
merge (bool) – whether to merge predicted and df into one dataframe. Options * (default)
False
: Returns separate dataframes *True
: Merges predicted and df into one dataframe
- Returns
dataframe with dates removed, that have been imputed and dropped
- Return type
pd.DataFrame
- neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, config_events: ConfigEvents, config_regressors: ConfigFutureRegressors, events_df=None, regressors_df=None)#
Extends df periods number steps into future.
- Parameters
df_columns (pd.DataFrame) – Dataframe columns
last_date (pd.Datetime) – last history date
periods (int) – number of future steps to predict
freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as
D
orM
config_events (configure.ConfigEvents) – User specified events configs
events_df (pd.DataFrame) – containing column
ds
andevent
config_regressors (configure.ConfigFutureRegressors) – configuration for user specified regressors,
regressors_df (pd.DataFrame) – containing column
ds
and one column for each of the external regressors
- Returns
input df with
ds
extended into future, andy
set to None- Return type
pd.DataFrame
- neuralprophet.df_utils.merge_dataframes(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame #
Join dataframes for procedures such as splitting data, set auto seasonalities, and others.
- Parameters
df (pd.DataFrame) – containing column
ds
,y
, andID
with data- Returns
Dataframe with concatenated time series (sorted ‘ds’, duplicates removed, index reset)
- Return type
pd.Dataframe
- neuralprophet.df_utils.normalize(df, data_params)#
Applies data scaling factors to df using data_params.
- Parameters
df (pd.DataFrame) – with columns
ds
,y
, (and potentially more regressors)data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing
shift
andscale
parameters
- Returns
normalized dataframes
- Return type
pd.DataFrame
- neuralprophet.df_utils.prep_or_copy_df(df: pandas.core.frame.DataFrame) tuple[pandas.core.frame.DataFrame, bool, bool, list[str]] #
Copy df if it contains the ID column. Creates ID column with ‘__df__’ if it is a df with a single time series. :param df: df or dict containing data :type df: pd.DataFrame
- Returns
pd.DataFrames – df with ID col
bool – whether the ID col was present
bool – wheter it is a single time series
list – list of IDs
- neuralprophet.df_utils.return_df_in_original_format(df, received_ID_col=False, received_single_time_series=True)#
Return dataframe in the original format.
- Parameters
df (pd.DataFrame) – df with data
received_ID_col (bool) – whether the ID col was present
received_single_time_series (bool) – wheter it is a single time series
- Returns
original input format
- Return type
pd.Dataframe
- neuralprophet.df_utils.split_considering_timestamp(df, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)#
Splits timeseries into train and validation sets according to given threshold_time_stamp.
- Parameters
df (pd.DataFrame) – data with column
ds
,y
, andID
n_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
threshold_time_stamp (str) – time stamp boundary that defines splitting of data
- Returns
pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data
- neuralprophet.df_utils.split_df(df: pandas.core.frame.DataFrame, n_lags: int, n_forecasts: int, valid_p: float = 0.2, inputs_overbleed: bool = True, local_split: bool = False)#
Splits timeseries df into train and validation sets.
Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global.
- Parameters
df (pd.DataFrame) – dataframe containing column
ds
,y
, and optionally``ID`` with all datan_lags (int) – identical to NeuralProphet
n_forecasts (int) – identical to NeuralProphet
valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1
inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets)
local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally
- Returns
pd.DataFrame, dict – training data
pd.DataFrame, dict – validation data
- neuralprophet.df_utils.unfold_dict_of_folds(folds_dict, k)#
Convert dict of folds for typical format of folding of train and test data.
- Parameters
folds_dict (dict) – dict of folds
k (int) – number of folds initially set
- Returns
training data
validation data
- Return type
list of k tuples [(df_train, df_val), …]