Core Module Documentation#
- class neuralprophet.df_utils.ShiftScale(shift: 'float' = 0.0, scale: 'float' = 1.0)#
- neuralprophet.df_utils.add_missing_dates_nan(df, freq)#
- Fills missing datetimes in - ds, with NaN for all other columns except- ID.- Parameters
- df (pd.Dataframe) – with column - dsdatetimes
- freq (str) – Frequency of data recording, any valid frequency for pd.date_range, such as - Dor- M
 
- Returns
- dataframe without date-gaps but nan-values 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.add_quarter_condition(df: pandas.core.frame.DataFrame)#
- Adds columns for conditional seasonalities to the df. - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- ywith all data
- Returns
- dataframe with added columns for conditional seasonalities - Note - Quarters correspond to northern hemisphere. 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.add_weekday_condition(df: pandas.core.frame.DataFrame)#
- Adds columns for conditional seasonalities to the df. - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- ywith all data
- Returns
- dataframe with added columns for conditional seasonalities 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.check_dataframe(df: pandas.core.frame.DataFrame, check_y: bool = True, covariates=None, regressors=None, events=None, seasonalities=None, future: Optional[bool] = None) Tuple[pandas.core.frame.DataFrame, List, List]#
- Performs basic data sanity checks and ordering, as well as prepare dataframe for fitting or predicting. - Parameters
- df (pd.DataFrame) – containing column - ds
- check_y (bool) – if df must have series values set to True if training or predicting with autoregression 
- covariates (list or dict) – covariate column names 
- regressors (list or dict) – regressor column names 
- events (list or dict) – event column names 
- seasonalities (list or dict) – seasonalities column names 
- future (bool) – if df is a future dataframe 
 
- Returns
- checked dataframe 
- Return type
- pd.DataFrame or dict 
 
- neuralprophet.df_utils.convert_events_to_features(df, config_events: ConfigEvents, events_df)#
- Converts events information into binary features of the df - Parameters
- df (pd.DataFrame) – Dataframe with columns - dsdatestamps and- ytime series values
- config_events (configure.ConfigEvents) – User specified events configs 
- events_df (pd.DataFrame) – containing column - dsand- event
 
- Returns
- input df with columns for user_specified features 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.convert_num_to_str_freq(freq_num, initial_time_stamp)#
- Convert numeric frequencies into frequency tags - Parameters
- freq_num (int) – numeric values of delta in ms 
- initial_time_stamp (str) – initial time stamp of data 
 
- Returns
- frequency tag 
- Return type
- str 
 
- neuralprophet.df_utils.convert_str_to_num_freq(freq_str)#
- Convert frequency tags into numeric delta in ms - Parameters
- str (freq_str) – frequency tag 
- Returns
- frequency numeric delta in ms 
- Return type
- numeric 
 
- neuralprophet.df_utils.create_dict_for_events_or_regressors(df: pandas.core.frame.DataFrame, other_df: Optional[pandas.core.frame.DataFrame], other_df_name: str) dict#
- Create a dict for events or regressors according to input df. - Parameters
- df (pd.DataFrame) – Dataframe with columns - dsdatestamps and- ytime series values
- other_df (pd.DataFrame) – Dataframe with events or regressors 
- other_df_name (str) – Definition of other_df (i.e. ‘events’, ‘regressors’) 
 
- Returns
- dictionary with events or regressors 
- Return type
- dict 
 
- neuralprophet.df_utils.create_dummy_datestamps(df, freq='S', startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0)#
- Helper function to create a dummy series of datestamps for equidistant data without ds. :param df: dataframe with column ‘y’ and without column ‘ds’ :type df: pd.DataFrame :param freq: Frequency of data recording, any valid frequency for pd.date_range, such as - Dor- M:type freq: str :param startyear: Defines the first datestamp :type startyear: int :param startmonth: Defines the first datestamp :type startmonth: int :param startday: Defines the first datestamp :type startday: int :param starthour: Defines the first datestamp :type starthour: int :param startminute: Defines the first datestamp :type startminute: int :param startsecond: Defines the first datestamp :type startsecond: int- Returns
- dataframe with dummy equidistant datestamps 
- Return type
- pd.DataFrame 
 - Examples - Adding dummy datestamps to a dataframe without datestamps. To prepare the dataframe for training, import df_utils and insert your preferred dates. - >>> from neuralprophet import df_utils >>> df_drop = df.drop("ds", axis=1) >>> df_dummy = df_utils.create_dummy_datestamps( >>> df_drop, freq="S", startyear=1970, startmonth=1, startday=1, starthour=0, startminute=0, startsecond=0 >>> ) 
- neuralprophet.df_utils.create_mask_for_prediction_frequency(prediction_frequency, ds, forecast_lag)#
- Creates a mask for the yhat array, to select the correct values for the prediction frequency. This method is only called in _reshape_raw_predictions_to_forecst_df within NeuralProphet.predict(). - Parameters
- prediction_frequency (dict) – identical to NeuralProphet 
- ds (pd.Series) – datestamps of the predictions 
- forecast_lag (int) – current forecast lag 
 
- Returns
- mask for the yhat array 
- Return type
- np.array 
 
- neuralprophet.df_utils.crossvalidation_split_df(df, n_lags, n_forecasts, k, fold_pct, fold_overlap_pct=0.0, global_model_cv_type='global-time')#
- Splits data in k folds for crossvalidation. - Parameters
- df (pd.DataFrame) – data 
- n_lags (int) – identical to NeuralProphet 
- n_forecasts (int) – identical to NeuralProphet 
- k (int) – number of CV folds 
- fold_pct (float) – percentage of overall samples to be in each fold 
- fold_overlap_pct (float) – percentage of overlap between the validation folds (default: 0.0) 
- global_model_cv_type (str) – - Type of crossvalidation to apply to the time series. - options: - global-time(default) crossvalidation is performed according to a time stamp threshold.- localeach episode will be crossvalidated locally (may cause time leakage among different episodes)- intersectonly the time intersection of all the episodes will be considered. A considerable amount of data may not be used. However, this approach guarantees an equal number of train/test samples for each episode.
 
- Returns
- training data - validation data 
- Return type
- list of k tuples [(df_train, df_val), …] 
 
- neuralprophet.df_utils.data_params_definition(df, normalize, config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, local_run_despite_global: Optional[bool] = None)#
- Initialize data scaling values. - Note - We do a z normalization on the target series - y, unlike OG Prophet, which does shift by min and scale by max.- Parameters
- df (pd.DataFrame) – Time series to compute normalization parameters from. 
- normalize (bool) – - Type of normalization to apply to the time series. - options: - soft(default), unless the time series is binary, in which case- minmaxis applied.- offbypasses data normalization- minmaxscales the minimum value to 0.0 and the maximum value to 1.0- standardizezero-centers and divides by the standard deviation- softscales the minimum value to 0.0 and the 95th quantile to 1.0- soft1scales the minimum value to 0.1 and the 90th quantile to 0.9
- config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors 
- normalize – data normalization 
- config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) with sub_parameters normalize (bool) 
- config_events (configure.ConfigEvents) – user specified events configs 
- config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs 
 
- Returns
- scaling values with ShiftScale entries containing - shiftand- scaleparameters.
- Return type
- OrderedDict 
 
- neuralprophet.df_utils.double_crossvalidation_split_df(df, n_lags, n_forecasts, k, valid_pct, test_pct)#
- Splits data in two sets of k folds for crossvalidation on validation and test data. - Parameters
- df (pd.DataFrame) – data 
- n_lags (int) – identical to NeuralProphet 
- n_forecasts (int) – identical to NeuralProphet 
- k (int) – number of CV folds 
- valid_pct (float) – percentage of overall samples to be in validation 
- test_pct (float) – percentage of overall samples to be in test 
 
- Returns
- elements same as - crossvalidation_split_df()returns
- Return type
- tuple of k tuples [(folds_val, folds_test), …] 
 
- neuralprophet.df_utils.drop_missing_from_df(df, drop_missing, predict_steps, n_lags)#
- Drops windows of missing values in df according to the (lagged) samples that are dropped from TimeDataset. - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- ywith all data
- drop_missing (bool) – identical to NeuralProphet 
- n_forecasts (int) – identical to NeuralProphet 
- n_lags (int) – identical to NeuralProphet 
 
- Returns
- dataframe with dropped NaN windows 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.fill_linear_then_rolling_avg(series, limit_linear, rolling)#
- Adds missing dates, fills missing values with linear imputation or trend. - Parameters
- series (pd.Series) – series with nan to be filled in. 
- limit_linear (int) – - maximum number of missing values to impute. - Note - because imputation is done in both directions, this value is effectively doubled. 
- rolling (int) – - maximal number of missing values to impute. - Note - window width is rolling + 2*limit_linear 
 
- Returns
- manipulated dataframe containing filled values 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.find_time_threshold(df, n_lags, n_forecasts, valid_p, inputs_overbleed)#
- Find time threshold for dividing timeseries into train and validation sets. Prevents overbleed of targets. Overbleed of inputs can be configured. - Parameters
- df (pd.DataFrame) – data with column - ds,- y, and- ID
- n_lags (int) – identical to NeuralProphet 
- valid_p (float) – fraction (0,1) of data to use for holdout validation set 
- inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets) 
 
- Returns
- time stamp threshold defines the boundary for the train and validation sets split. 
- Return type
- str 
 
- neuralprophet.df_utils.find_valid_time_interval_for_cv(df)#
- Find time interval of interception among all the time series from dict. - Parameters
- df (pd.DataFrame) – data with column - ds,- y, and- ID
- Returns
- str – time interval start 
- str – time interval end 
 
 
- neuralprophet.df_utils.get_dist_considering_two_freqs(dist)#
- Add occasions of the two most common frequencies - Note - Useful for the frequency exceptions (i.e. - M,- Y,- Q,- B, and- BH).- Parameters
- dist (list) – list of occasions of frequencies 
- Returns
- sum of the two most common frequencies occasions 
- Return type
- numeric 
 
- neuralprophet.df_utils.get_freq_dist(ds_col)#
- Get frequency distribution of - dscolumn.- Parameters
- ds_col (pd.DataFrame) – - dscolumn of dataframe
- Returns
- numeric delta values ( - ms) and distribution of frequency counts
- Return type
- tuple 
 
- neuralprophet.df_utils.get_max_num_lags(config_lagged_regressors: Optional[ConfigLaggedRegressors], n_lags: int) int#
- Get the greatest number of lags between the autoregression lags and the covariates lags. - Parameters
- config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors 
- n_lags (int) – number of lagged values of series to include as model inputs 
 
- Returns
- Maximum number of lags between the autoregression lags and the covariates lags. 
- Return type
- int 
 
- neuralprophet.df_utils.handle_negative_values(df, col, handle_negatives)#
- Handles negative values in a column according to the handle_negatives parameter. - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- ywith all data
- col (str) – name of the regressor column 
- handle_negatives (str, int, float) – - specified handling of negative values in the regressor column. Can be one of the following options: - Options
- remove: Remove all negative values of the regressor.
- error: Raise an error in case of a negative value.
- floator- int: Replace negative values with the provided value.
- (default) - None: Do not handle negative values.
 
 
 
- Returns
- dataframe with handled negative values 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.infer_frequency(df, freq, n_lags, min_freq_percentage=0.7)#
- Automatically infers frequency of dataframe. - Parameters
- df (pd.DataFrame) – Dataframe with columns - dsdatestamps and- ytime series values, and optionally``ID``
- freq (str) – - Data step sizes, i.e. frequency of data recording, - Note - Any valid frequency for pd.date_range, such as - 5min,- D,- MSor- auto(default) to automatically set frequency.
- n_lags (int) – identical to NeuralProphet 
- min_freq_percentage (float) – threshold for defining major frequency of data (default: - 0.7
 
- Returns
- Valid frequency tag according to major frequency. 
- Return type
- str 
 
- neuralprophet.df_utils.init_data_params(df, normalize='auto', config_lagged_regressors: Optional[ConfigLaggedRegressors] = None, config_regressors: Optional[ConfigFutureRegressors] = None, config_events: Optional[ConfigEvents] = None, config_seasonality: Optional[ConfigSeasonality] = None, global_normalization=False, global_time_normalization=False)#
- Initialize data scaling values. - Note - We compute and store local and global normalization parameters independent of settings. - Parameters
- df (pd.DataFrame) – data to compute normalization parameters from. 
- normalize (str) – - Type of normalization to apply to the time series. - options: - soft(default), unless the time series is binary, in which case- minmaxis applied.- offbypasses data normalization- minmaxscales the minimum value to 0.0 and the maximum value to 1.0- standardizezero-centers and divides by the standard deviation- softscales the minimum value to 0.0 and the 95th quantile to 1.0- soft1scales the minimum value to 0.1 and the 90th quantile to 0.9
- config_lagged_regressors (configure.ConfigLaggedRegressors) – Configurations for lagged regressors 
- config_regressors (configure.ConfigFutureRegressors) – extra regressors (with known future values) 
- config_events (configure.ConfigEvents) – user specified events configs 
- config_seasonality (configure.ConfigSeasonality) – user specified seasonality configs 
- global_normalization (bool) – - True: sets global modeling training with global normalization- False: sets global modeling training with local normalization
- global_time_normalization (bool) – - True: normalize time globally across all time series- False: normalize time locally for each time series- (only valid in case of global modeling - local normalization) 
 
- Returns
- OrderedDict – nested dict with data_params for each dataset where each contains 
- OrderedDict – ShiftScale entries containing - shiftand- scaleparameters for each column
 
 
- neuralprophet.df_utils.join_dfs_after_data_drop(predicted, df, merge=False)#
- Creates the intersection between df and predicted, removing any dates that have been imputed and dropped in NeuralProphet.predict(). - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- ywith all data
- predicted (pd.DataFrame) – output dataframe of NeuralProphet.predict. 
- merge (bool) – whether to merge predicted and df into one dataframe. Options * (default) - False: Returns separate dataframes *- True: Merges predicted and df into one dataframe
 
- Returns
- dataframe with dates removed, that have been imputed and dropped 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.make_future_df(df_columns, last_date, periods, freq, config_events: ConfigEvents, config_regressors: ConfigFutureRegressors, events_df=None, regressors_df=None)#
- Extends df periods number steps into future. - Parameters
- df_columns (pd.DataFrame) – Dataframe columns 
- last_date (pd.Datetime) – last history date 
- periods (int) – number of future steps to predict 
- freq (str) – Data step sizes. Frequency of data recording, any valid frequency for pd.date_range, such as - Dor- M
- config_events (configure.ConfigEvents) – User specified events configs 
- events_df (pd.DataFrame) – containing column - dsand- event
- config_regressors (configure.ConfigFutureRegressors) – configuration for user specified regressors, 
- regressors_df (pd.DataFrame) – containing column - dsand one column for each of the external regressors
 
- Returns
- input df with - dsextended into future, and- yset to None
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.merge_dataframes(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame#
- Join dataframes for procedures such as splitting data, set auto seasonalities, and others. - Parameters
- df (pd.DataFrame) – containing column - ds,- y, and- IDwith data
- Returns
- Dataframe with concatenated time series (sorted ‘ds’, duplicates removed, index reset) 
- Return type
- pd.Dataframe 
 
- neuralprophet.df_utils.normalize(df, data_params)#
- Applies data scaling factors to df using data_params. - Parameters
- df (pd.DataFrame) – with columns - ds,- y, (and potentially more regressors)
- data_params (OrderedDict) – scaling values, as returned by init_data_params with ShiftScale entries containing - shiftand- scaleparameters
 
- Returns
- normalized dataframes 
- Return type
- pd.DataFrame 
 
- neuralprophet.df_utils.prep_or_copy_df(df: pandas.core.frame.DataFrame) tuple[pandas.core.frame.DataFrame, bool, bool, list[str]]#
- Copy df if it contains the ID column. Creates ID column with ‘__df__’ if it is a df with a single time series. :param df: df or dict containing data :type df: pd.DataFrame - Returns
- pd.DataFrames – df with ID col 
- bool – whether the ID col was present 
- bool – wheter it is a single time series 
- list – list of IDs 
 
 
- neuralprophet.df_utils.return_df_in_original_format(df, received_ID_col=False, received_single_time_series=True)#
- Return dataframe in the original format. - Parameters
- df (pd.DataFrame) – df with data 
- received_ID_col (bool) – whether the ID col was present 
- received_single_time_series (bool) – wheter it is a single time series 
 
- Returns
- original input format 
- Return type
- pd.Dataframe 
 
- neuralprophet.df_utils.split_considering_timestamp(df, n_lags, n_forecasts, inputs_overbleed, threshold_time_stamp)#
- Splits timeseries into train and validation sets according to given threshold_time_stamp. - Parameters
- df (pd.DataFrame) – data with column - ds,- y, and- ID
- n_lags (int) – identical to NeuralProphet 
- n_forecasts (int) – identical to NeuralProphet 
- inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets) 
- threshold_time_stamp (str) – time stamp boundary that defines splitting of data 
 
- Returns
- pd.DataFrame, dict – training data 
- pd.DataFrame, dict – validation data 
 
 
- neuralprophet.df_utils.split_df(df: pandas.core.frame.DataFrame, n_lags: int, n_forecasts: int, valid_p: float = 0.2, inputs_overbleed: bool = True, local_split: bool = False)#
- Splits timeseries df into train and validation sets. - Prevents overbleed of targets. Overbleed of inputs can be configured. In case of global modeling the split could be either local or global. - Parameters
- df (pd.DataFrame) – dataframe containing column - ds,- y, and optionally``ID`` with all data
- n_lags (int) – identical to NeuralProphet 
- n_forecasts (int) – identical to NeuralProphet 
- valid_p (float, int) – fraction (0,1) of data to use for holdout validation set, or number of validation samples >1 
- inputs_overbleed (bool) – Whether to allow last training targets to be first validation inputs (never targets) 
- local_split (bool) – when set to true, each episode from a dict of dataframes will be split locally 
 
- Returns
- pd.DataFrame, dict – training data 
- pd.DataFrame, dict – validation data 
 
 
- neuralprophet.df_utils.unfold_dict_of_folds(folds_dict, k)#
- Convert dict of folds for typical format of folding of train and test data. - Parameters
- folds_dict (dict) – dict of folds 
- k (int) – number of folds initially set 
 
- Returns
- training data - validation data 
- Return type
- list of k tuples [(df_train, df_val), …]