Inflation Forecasting

SwatiSethee
Inflation Forecasting
17 min readJun 8, 2020

--

Project by Swati Sethee and Shruti Gupta
Ashoka University

May, 2020

INTRODUCTION

Inflation forecasting is of paramount importance for conducting monetary policy. Since monetary policy transmission is associated with significant lags, central banks aiming to achieve price stability need to be forward-looking in their decisions. Hence, the formation of expectations is central to the question of what drives and assists prediction of inflation. Good inflation forecasts are not only desirable, but also important for correct policy formulation and future adjustments.

1. SARIMAX Model

Seasonal Auto-Regressive Integrated Moving Average with eXogenous regressors, SARIMAX or Seasonal ARIMAX, is an extension of ARIMAX that explicitly supports multivariate time series data with a seasonal component. The seasonal part of the model consists of terms that are very similar to the non-seasonal components of the model, but they involve back-shifts of the seasonal period.

We demonstrate the application of SARIMAX model for the Indian monthly inflation forecasting. Using long-span historical data, we build a robust and well-behaved model that captures structural variations in the Indian monthly inflation. Forecasts are reasonable and found within the model’s bands.

Importing the Required Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import datetime, statsmodels, warnings
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima_model import ARIMAfrom statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import acf, pacf
warnings.simplefilter(“ignore”)
from sklearn.metrics import mean_squared_error
from math import sqrt
from dateutil.relativedelta import relativedelta
from datetime import date, timedelta
import ipywidgets as widgets

Loading the Data Set

You can access the dataset and the model’s notebook from the link here.

#Loading the dataset
data = pd.read_excel(“...file path...”)

Data Source: Organization for Economic Co-operation and Development, Consumer Price Index: All Items for India [INDCPIALLMINMEI], retrieved from FRED, Federal Reserve Bank of St. Louis

Seasonal Trend

#Generate a widget
widget = widgets.IntRangeSlider(value=[1975, 2000], min=1960, max=2020, step=1, description='Time Period:',
disabled=False, continuous_update=True, orientation='horizontal', readout=True, readout_format='d')
widget
#Prepare Data
years = widget.value
years = list(range(years[0],years[1]+1))
data['Year'] = pd.DatetimeIndex(data['observation_date']).year
data['Month'] = [d.strftime('%b') for d in data.observation_date]
#Draw Plot
fig = go.Figure()
for i,y in enumerate(years):
if i<len(years):
fig.add_trace(go.Scatter(
x=data[data['Year']==years[i]].Month.tolist(),
y=data[data['Year']==years[i]].Monthly_Rate.round(2).tolist(),
name=str(years[i])))
fig.update_layout(title=go.layout.Title(
text="Seasonal Trend of Monthly Inflation Rate",
xref="paper",
x=0),
margin=dict(l=10, r=0, t=50, b=50),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text="Month",font=dict(
family="Courier New, monospace",
size=18,
color="#7f7f7f"))),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text="Inflation",font=dict(
family="Courier New, monospace",
size=18,
color="#7f7f7f"))))
fig.update_yaxes(nticks=10)

fig.show()

From the graph above it is evident that inflation rate is high in April, then peaks again in June and July, dips in August, peaks in October and finally dips in December. This shows that there exists an noticeable seasonal trend in our time series data. This important observation leads us to believe that we will be required to use SARIMAX instead of the simple ARIMAX Model.

EXPLORATORY ANALYSIS

Splitting the data into two time periods: Train and Valid

We need to train our model on data from the past first using the ‘Train’ dataset and then we forecast monthly inflation which we validate with the actual data in the ‘Valid’ dataset.

Train data: 1960–1996;
Valid data: 1996–2020

Our training set will be around 60% and valid set will be around 40% of the entire data set.

Train = data[data.Year < 1996]
Valid = data[data.Year > 1995]

Visualise the data

fig= go.Figure()
fig.add_trace(go.Scatter(dict(x=Train.observation_date, y=Train.Monthly_Rate, mode=’lines+markers’, name= ‘Train’)))
fig.add_trace(go.Scatter(dict(x=Valid.observation_date, y=Valid.Monthly_Rate, mode=’lines+markers’, name= ‘Valid’)))
fig.update_layout(title=go.layout.Title(
text=”Monthly Infaltion Rates for 1960–2020",
xref=”paper”,x=0),
margin=dict(l=10, r=0, t=50, b=50),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text=”Year”,font=dict(
family=”Courier New, monospace”,
size=18,
color=”#7f7f7f”))),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text=”Inflation”,font=dict(
family=”Courier New, monospace”,
size=18,
color=”#7f7f7f”))))
fig.show()

Blue line denotes the Train data and red line denotes the Valid data

A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time. Property of constant variance is known as homoskedasticity. If a time series data is stationary and has a particular behaviour over a given time interval, then it is safe to assume that it will have same behaviour at some later point in time. Therefore, a stationarised series is easier to predict.

The Train data is not stationary as the inflation rates are not uniformly distributed around 0. However, this is a very informal method to check for stationarity.

SEASONAL DECOMPOSITION OF DATA

Time series data is composed of Level, Trend, Seasonality and Random noise. Let’s decompose the data and plot the trend, seasonality and randomness in the data. We use statsmodel for seasonal decompose and the frequency of the time series, which is the periodicity of the data, is 12 months for yearly observation.

result = seasonal_decompose(Train['Monthly_Rate'].dropna(), model='additive', freq =12)
result.plot()
plt.show()

From this graph we can extrapolate if the data has any sort of cyclicality in any component. We can see the graphs labelled ‘Observed’, ‘Trend’, and ‘Residual’ are fluctuating randomly, i.e they exhibit no systematic trend. The ‘Seasonal’ graph, however, shows that there exists an evident cyclical trend implying seasonality in the dataset that we’re required to remove before proceeding to modelling.

Formal checks for Stationarity

There are two primary ways to determine whether a given time series is stationary or not:

  • Plotting data along with Rolling Average and Rolling Standard Deviation: The time series is stationary if they remain steady with time.
  • Augmented Dickey-Fuller Test: The time series is considered stationary if the p-value is low (<0.05) and the Test Statistic is lower than the critical values at 1%, 5%, 10% levels of significance.

A 12-month rolling mean averages out the inflation rates for months 1–12 to give the first data point at the 12th position. For the next data point , it would drop the first inflation rate and add the rate at the 13th month to calculate the average inflation rates for months 2–13, and so on.

#Defining the check for Data Stationarity using Rolling Mean and Rolling Standard Deviation
def TestStationaryPlot(ts):
rol_mean = ts.rolling(window = 12, center = False).mean()
rol_std = ts.rolling(window = 12, center = False).std()

fig= go.Figure()
fig.add_trace(go.Scatter(dict(x=data.observation_date, y=ts, mode='lines+markers', name= 'Original Data')))
fig.add_trace(go.Scatter(dict(x=data.observation_date, y=rol_mean, mode='lines', name= 'Rolling Mean',line=dict(color='black', width=2))))
fig.add_trace(go.Scatter(dict(x=data.observation_date, y=rol_std, mode='lines', name= 'Rolling Standard',line=dict(color='brown', width=2))))
fig.update_layout(title=go.layout.Title(
text="Rolling Mean & Standard Deviation",
xref="paper",x=0),
margin=dict(l=10, r=0, t=50, b=50),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text="Year",font=dict(
family="Courier New, monospace",
size=18,
color="#7f7f7f"))),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text="Inflation",font=dict(
family="Courier New, monospace",
size=18,
color="#7f7f7f"))))
fig.show()#Defining the check for Data Stationarity using Augmented Dickey Fuller(ADF) test
def adf_test(timeseries):
print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='BIC',regression='c')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput)

We use the AD Fuller test to check stationarity with Bayesian Information Criterion (BIC). BIC is based on the likelihood function and is closely related to the Akaike Information Criterion (AIC). It is possible to increase the likelihood of the model by using unnecessary parameters which causes over fitting in the model. The BIC resolves this imposing a greater penalty for the number of parameters in the model than the AIC.

TestStationaryPlot(Train[“Monthly_Rate”].dropna(inplace=False))
adf_test(Train[“Monthly_Rate”].dropna())

It is evident from the graph that the dataset is not stationary as the rolling mean and standard deviation are fluctuating greatly about 0. The ADF test suggests that the data set is stationary as the p-value is extremely small and the Test Statistic is lower than the critical values at 99%, 95%, 90% confidence intervals.
Thus, we can conclude that the Train time series data is not stationary and will require differencing.

DIFFERENCING — ELIMINATING TREND AND SEASONALITY

Seasonal First Difference

Train_seasonal_difference = Train["Monthly_Rate"] - Train["Monthly_Rate"].shift(12)
Train_seasonal_first_difference = Train_seasonal_difference - Train_seasonal_difference.shift(1)
#Plotting and testing the time series for stationarity
TestStationaryPlot(Train_seasonal_first_difference.dropna(inplace=False))
adf_test(Train_seasonal_first_difference.dropna(inplace=False))

From the graph we can observe that the rolling mean has now stabilised greatly and the rolling standard deviation has also smoothed out a little indicating that this data set is stationary. The ADF test suggests that the p-value is extremely small and the Test Statistic is less than all the critical values mentioned in the test. This supplements our observation from the graph that we can conclude stationarity of the seasonal first difference Train data.

Checking degree of seasonality over various data sets

#Seasonal Decomposition
result= seasonal_decompose(data['Monthly_Rate'].dropna(), model= 'additive', freq =12)
result.plot()
plt.suptitle('Original Data', y=1)
plt.show()
result1= seasonal_decompose(Train_first_difference.dropna(), model= 'additive', freq =12)
result1.plot()
plt.suptitle('First difference of Train Data', y=1)
plt.show()
result2=seasonal_decompose(Train_seasonal_first_difference.dropna(), model='additive', freq =12)
result2.plot()
plt.suptitle('First difference of Seasonal Train Data', y=1)
plt.show()

We observe that the intensity of the seasonal trend has dampened as we’ve progressed. It was the most in the original data, slightly less is first difference of the Train data, and even lesser in seasonal first difference of the Train data set. We can make these claims by observing that the cycles are getting more spaced out in each subsequent graph.

BUILDING THE MODEL

Selecting Parameters using Autocorrelation and Partial Autocorrelation graphs

The parameters of the ARIMA model are defined as follows:

  • p: The number of lag observations included in the model, also called the Autoregressive terms
  • d: The number of times that the raw observations are differenced, also called the degree of differencing
  • q: The size of the moving average window, also called the Moving Average terms
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
#Autocorrelation Graph for 40 periods
fig = sm.graphics.tsa.plot_acf(Train_seasonal_first_difference.dropna(), lags=40, ax=ax1)
#Partial autocorrelation Graph for 40 periods
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(Train_seasonal_first_difference.dropna(), lags=40, ax=ax2)

The shaded region in both the graphs represents a 95% confidence interval band. Anything outside the shaded region signifies a statistically significant correlation.

  • The autocorrelation graph displays the correlation between the observations at the current point in time and the observations at all previous points in time. Spike in autocorrelation graphs tells us the number of Auto-Regressive (AR) terms or the number of lags.
  • The partial autocorrelation graph displays the duration of the influence of a random shock, i.e. correlation which is not explained by the set of explanatory variables. Spike in the partial autocorrelation tells us the number of Moving Average (MR) Terms.

In the autocorrelation graph we observe significant spikes at 0, 1, 2 and 3 but in order to avoid the risk of over-fitting our model we don’t consider 3. In the partial autocorrelation graph we observe a significant spikes at 0, 1, 2 only.

Auto-Regressive terms = Lags = p = [0,1,2] ;
Moving Average terms = q = [0,1,2]

SARIMAX Model

There are four seasonal elements that are not part of ARIMA that must be configured in the SARIMAX model; they are:

  • P: Seasonal autoregressive order
  • D: Seasonal difference order
  • Q: Seasonal moving average order
  • m: The number of time steps for a single seasonal period

The notation for an SARIMAX model is specified as SARIMAX(p,d,q)(P,D,Q,m).
After testing the model multiple times we conclude that the order = (1,0,1) and the seasonal order = (1,1,0,12) is the best fit for the SARIMAX model. In the seasonal order, m=12 represents the 12 months in a single year, and D=1 represents the first difference of the Seasonal Train data we had to take to make the time series data stationary and to remove seasonality. Taking just the first difference of the Train data was not sufficient as it did not get rid of seasonality in our time series and so d=0 and D=1.
We introduce ‘Bank Rate’ as the exogenous variable and take a lagged value of it (by 5 quarters) to fit in our model since inflation does not react to changes in monetary policy contemporaneously, it takes time for the effects to play out in the economy.

sar = pd.DataFrame(Train["Bank Rate"])# Introducing a lag of 5 quarters 
sar["Bank Rate Lagged"] = sar["Bank Rate"].shift(15)
#Solving for NAN values
sar["Bank Rate Lagged"].fillna(0, inplace = True)
#SARIMAX Model
y_hat_avg = Valid.copy()
fit1= SARIMAX(Train["Monthly_Rate"], order=(1, 0, 1),seasonal_order=(1,1,0,12), exog = sar["Bank Rate Lagged"],
trend='ct', simple_differencing=False,enforce_stationarity=False).fit()
print(fit1.summary())

The hypotheses for the Ljung — Box test are:
𝐻H₀: Residual is white noise
𝐻𝐻₁: Residual is not white noise
The “residuals” in a time series model are what is left over after fitting a model. Residuals are useful in checking whether a model has adequately captured the information in the data.

The hypotheses for the Jarque-Bera test are:
𝐻𝐻₀: The series is normally distributed
𝐻𝐻₁: The series is not normally distributed

The hypotheses for the Heteroskedasticity test are:
𝐻𝐻₀: Model is homoskedastic
𝐻𝐻₁: Model is heteroskedastic

  1. The p-value of all the parameters is ‘0’ implying that the regression is significant.
  2. The p-value of the Ljung — Box test is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the residual is white noise. A white noise process is a random process of residuals that are uncorrelated, have mean zero, and a finite variance.
  3. Similarly, the p-value of the Jarque bera test is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the data is indeed normally distributed.
  4. The p-value of the test for Heteroskedasticity is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the variance of the error term is constant across all values of the independent variables.

Therefore, the residuals are white noise, data is normally distributed, and the model has constant error variance, all of which are ideal conditions.

Predicting forecasts

#Preparing Data
start = Valid.index[0]-1 #(=431)
end = Valid.index[-1]+1 #(=721)
x = data.loc[start:,"Bank Rate"]
x = x.shift(15)
x.fillna(0, inplace = True)
pred = pd.DataFrame(x)
#Predicting inflation
y_hat_avg['SARIMAX'] = fit1.predict(start=start, end=end, dynamic=True, exog = pred)
#Correcting for NAN values
Valid.Monthly_Rate=np.where(np.isnan(Valid.Monthly_Rate), 0, Valid.Monthly_Rate)
y_hat_avg.SARIMAX=np.where(np.isnan(y_hat_avg.SARIMAX), 0, y_hat_avg.SARIMAX)
#Root mean square and standard deviation
rms = sqrt(mean_squared_error(Valid.Monthly_Rate, y_hat_avg.SARIMAX.dropna()))
std=Valid.Monthly_Rate.std()
error= round(round(rms,4) - round(std,4),4)
#Plotting the residual density and estimating its distribution
residuals = pd.DataFrame(fit1.resid)
fig, ax = plt.subplots(1,2)
fig.subplots_adjust(hspace=0.3, wspace=0.4)
residuals.plot(title="Residuals", ax=ax[0])
residuals.plot(kind='kde', title='Density', ax=ax[1])
plt.show()
print(residuals.describe())

The root mean squared error of the predicted inflation by SARIMAX and inflation in the Valid data set is 0.8959. The standard deviation of the inflation in the Valid data set is 0.8347. The error in our model’s prediction is 0.0612% and thus, our results are robust.

The mean of residuals is 0.006 which is exceptionally close to 0 and the graph resembles a normally distributed series, thus complementing our belief that the model is accurate.

Predicting Bank Rate using ARIMA for the period 2020–2040

We can create and fit an ARIMA model with order = (1,0,0) for the ARIMA model.

#Preparing data
date_list = []
start = datetime.datetime(2020,2,1)
for x in range(0,251):
dates = (start + relativedelta(months=x)).strftime("%Y-%m-%d")
date_list.append(dates)
col = pd.Series(date_list)
df = pd.DataFrame()
df['observation_date'] = col
data = pd.concat((data,df))
data.reset_index(inplace=True)
data['Bank Rate'].fillna(0,inplace= True)
#ARIMA Model
fit2= ARIMA(data["Bank Rate"], order=(1, 0, 0)).fit(disp=0)
print(fit2.summary())

Predicting Bank Rates for period 2020–2040 and concatenating the predicted results of Bank Rates in our original DataFrame.

#Predicting Bank Rates
pred_uc = fit2.predict(start = 721, dynamic=True)
results = pd.Series(pred_uc)
df = pd.DataFrame()
df['Bank Rate'] = results
data.loc[721:, 'Bank Rate'] = df

Predicting Monthly Inflation using SARIMAX Model for the period 2020–2040

We choose the order = (1,0,1) and the seasonal order = (1,1,0,12) for the SARIMAX model.

fit3 = SARIMAX(data["Monthly_Rate"],order=(1,0,1),
seasonal_order=(1,1,0,12), trend='ct',exog=data["Bank Rate"].shift(16).fillna(0),simple_differencing=False, enforce_stationarity=False).fit()

Predicting the future Indian monthly inflation rate using the forecasted bank rates in the SARIMAX Model and concatenating the predicted results in our original DataFrame to plot them.

forecast = fit3.predict(start=722, dynamic= True, exog= df) 
inflation = pd.Series(forecast)
inf_rate = pd.DataFrame()
inf_rate['Monthly_Rate'] = inflation
data.loc[721:, 'Monthly_Rate'] = inf_rate['Monthly_Rate']

SUMMARY:

  1. We studied the SARIMAX extension of ARIMA that explicitly models the seasonal element in multivariate data and predicted the Indian monthly inflation rates by identifying the best SARIMAX model.
  2. We predicted Bank Rates by identifying the best ARIMA model using univariate input features for future periods.
  3. We predicted the Indian monthly inflation rates by using SARIMAX model using multivariate input features for future periods.

2. Economic Modelling using NKPC

Check the validity of the New Keynesian Phillips Curve in the Indian Context.

We have inflation data for 1960–2020. We divide this time period into three parts.
Train: 1960–1990
Valid: 1991–2000.
Forecast: 2001–2020

  • Reverse engineer the values of real marginal cost for the 1960–2000.
  • On the basis of the calculated values for the Train data set, predict values for Valid data set. If it fits well, continue prediction for Forecast data set.
  • Calculate inflation using NKPC. Check the values of actual inflation against forecasted inflation to examine the validity of the model.

Restrict the data to the year 2000. We will forecast inflation for the rest of the years.

df = data[data['Year'] < 2001]

BUILDING THE MODEL

We define β and λ as two constants ∈ (0,1). You can change these values to check for varying results. Substitute the values of inflation in periods t and t+1 to get the values of the real marginal cost.

#Defining parameters
beta = 1
lam = 0.03
#Calculating Marginal Cost
x= df.Monthly_Rate
z=list()
for i in range(0,len(df.Monthly_Rate)-1):
y =(x[i]- (beta * x[i+1]))/ lam
z.append(y)
#Adding Marginal Cost to original dataframe
df1= pd.DataFrame({'mcr':z})
df['mcr']= df1['mcr']

Visualising Marginal Cost

Splitting the data into three time periods: Train, Valid and Forecast

Train= df.loc[(df['Year'] >= 1960) & (df['Year'] <= 1990)]
Valid= df.loc[(df['Year'] >= 1991) & (df['Year'] <= 2000)]
Forecast= data.loc[(data['Year'] >= 2001) & (data['Year'] <= 2020)]

In order to make the series more stationary we use the differencing approach. On trying with multiple orders of differencing, we found that first order gave us the best results.

#Preparing the data 
Train['First_Diff'] = Train['mcr'] - Train['mcr'].shift(1)
#Plotting and testing
TestStationaryPlot(Train["First_Diff"].dropna(inplace=False))
adf_test(Train["First_Diff"].dropna())

On plotting the rolling mean and rolling standard for the real marginal cost of the first order differentiated train data, we see a stabilised rolling mean suggesting that this data can be used for trend analysis. The p-value is also very small, hinting towards the suitability of the data. We would, however, still need to check the rolling mean of the seasonally differentiated data.

Checking and eliminating seasonality

# Prepare data
years = widget.value
years = list(range(years[0],years[1]+1))
# Draw Plot
fig = go.Figure()
for i,y in enumerate(years):
if i<len(years):
fig.add_trace(go.Scatter(
x=df[df['Year']==years[i]].Month.tolist(),
y=df[df['Year']==years[i]].mcr.round(2).tolist(),
name=str(years[i])))
#Layout changes
fig.update_layout(title=go.layout.Title(
text="Seasonal Trend of MCR",
xref="paper",
x=0),
margin=dict(l=10, r=0, t=50, b=50), xaxis=go.layout.XAxis(title= go.layout.xaxis.Title(text="Month",font=dict(family="Courier New, monospace", size=18, color="#7f7f7f"))), yaxis= go.layout.YAxis(title=go.layout.yaxis.Title(text="mcr",font=dict(
family="Courier New, monospace",
size=18, color="#7f7f7f"))))
fig.update_yaxes(nticks=10)
fig.show()

It can be seen that the line graphs for multiple years follow similar trends. They move upward/downward at same time and with similar intensity. This shows the presence of evident seasonal trend that we’re required to remove before proceeding to modelling.

After testing for stationarity with multiple orders, we find that seasonal first differencing gives the best results and takes care of seasonality present in the data.

TestStationaryPlot(Train["Seasonal_First_Diff"].dropna(inplace=False))
adf_test(Train["Seasonal_First_Diff"].dropna(inplace=False))

The results give us stationary a series as the first order differentiated data but is better in the sense that it takes care of the seasonality.

SARIMA Model Prediction for years 1990–2000

#Correcting for NAN values
Train.Seasonal_First_Diff=np.where(np.isnan(Train.Seasonal_First_Diff),0,Train.Seasonal_First_Diff)
#SARIMA Model
y_hat_avg = Valid.copy()
fit1= fit1 = sm.tsa.statespace.SARIMAX(Train.mcr.dropna(), order=(1,0,1), seasonal_order=(1,0,1,12), simple_differencing=True, enforce_stationarity=True).fit()
print(fit1.summary())

The p-value of all the parameters is ‘0’ implying that the regression is significant.

  1. The p-value of the Ljung — Box test is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the residual is white noise. A white noise process is a random process of residuals that are uncorrelated, have mean zero, and a finite variance.
  2. Similarly, the p-value of the Jarque bera test is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the data is indeed normally distributed.
  3. The p-value of the test for Heteroskedasticity is ‘0’ implying that we are unable to reject the null hypothesis, i.e. the variance of the error term is constant across all values of the independent variables.
  4. Therefore, the residuals are white noise, data is normally distributed, and the model has constant error variance, all of which are ideal conditions.

Predicting Marginal Cost using SARIMA for period 1990–2000

#Preparing Data
start1 = Valid.index[0] #(=372)
end1 = Valid.index[-1] #(=491)
#Predicting Real Marginal Cost
y_hat_avg['SARIMA'] = fit1.predict(start=start1, end=end1, dynamic=True)
Valid['Predicted'] = y_hat_avg['SARIMA']
#Visualising the forecasts
fig= go.Figure()
fig.add_trace(go.Scatter(dict(x=Train.observation_date,y=Train["mcr"],mode='lines',name= 'Train')))
fig.add_trace(go.Scatter(dict(x=Valid.observation_date,y=Valid['mcr'],mode='lines',name= 'Valid')))
fig.add_trace(go.Scatter(dict(x=Valid.observation_date,y=Valid['Predicted'],mode='lines',name= 'Predicted')))
fig.update_layout(title=go.layout.Title(
text="SARIMA Prediction of mcr",
xref="paper",x=0),
margin=dict(l=10, r=0, t=50, b=50),
xaxis=go.layout.XAxis(title=go.layout.xaxis.Title(text="Time", font=dict(family="Courier New, monospace",size=18, color="#7f7f7f"))),
yaxis=go.layout.YAxis(title=go.layout.yaxis.Title(text="mcr", font=dict(family="Courier New, monospace",size=18,
color="#7f7f7f"))))
fig.show()
#Correcting for NAN values
Valid.mcr=np.where(np.isnan(Valid.mcr), 0, Valid.mcr)
y_hat_avg.SARIMAX=np.where(np.isnan(y_hat_avg.SARIMAX), 0, y_hat_avg.SARIMAX)
#Root mean square
rms1 = sqrt(mean_squared_error(Valid.Monthly_Rate, y_hat_avg.SARIMAX.dropna()))
std1 = Valid.mcr.dropna().std()
#Plotting the residual density and estimating its distribution
residuals = pd.DataFrame(fit1.resid)
fig, ax = plt.subplots(1,2)
fig.subplots_adjust(hspace=0.3, wspace=0.4)
residuals.plot(title="Residuals", ax=ax[0])
residuals.plot(kind='kde', title='Density', ax=ax[1])
plt.show()
print(residuals.describe())

The root mean squared error of the predicted marginal cost by SARIMA and marginal cost in the Valid data set is 28.6344. The standard deviation of the marginal cost in the Valid data set is 33.1506. The error in our model’s prediction is -4.5162% and implying that our results are very robust.

Forecasted Marginal Cost for period 2001–2020

Predict MCR for years 2001–2020 and then concatenate the series in the ‘Forecast’ dataframe which has the same time period.

#Preparing the data
start2 = end1
end2 = data.index[-1] #(=720)
pred = fit1.predict(start=start2, end=end2, dynamic=True)
Forecast['mcr']=pd.Series(pred)

Here we have plotted originally estimated MCR and the MCR forecast.

Calculating inflation using the forecasted MCR values

Reverse engineer monthly inflation rates by substituting the forecasted marginal cost values in the NKPC equation. This is done by taking taking the last entry in the Valid data set, indexed 491 in the original dataset, as inflation at time period ‘t’ and calculating the expected inflation in period ‘t+1’ using the forecasted marginal cost value of time period ‘t’. Repeating this process using the inflation at period ‘t+1’ to calculate expected inflation at period ‘t+2’ and so on, we can predict the monthly inflation rates for the period 2001–2020.

#Add a column with NaN values in the Forecast dataframe
Forecast[‘forecast’]= pd.Series()
#Using the NKPC formula, find expected inflation for the forecast period
Forecast.loc[492, 'forecast']= data.loc[491,'Monthly_Rate'] - lam * Valid.loc[491,'mcr']
for i in range(492, 720):
Forecast.loc[i+1,'forecast'] = [Forecast.loc[i,'forecast'] - (lam * Forecast.loc[i,'mcr'])]
for i in range(492, 720):
Forecast.loc[i,'forecast'] = (Forecast.loc[i,'forecast'])/ beta
Forecast

Plotting Original data against the Forecasted Data

We see that inflation forecast values are always below the original inflation. The trend we see here does not match with the original values. This shows that NKPC is not a sound model for India as it fails to correctly predict the Indian inflation rates.

CONCLUSION

This article demonstrated how to use python to forecast inflation rates using SARIMAX model and the NKPC economic model.

New Keynesian Phillips Curve is a dominant model of inflation dynamics but empirical evidence suggest that there is an ardent need for a more evolved and realistic model for inflation forecasting. Rejections of NKPC do not necessarily imply invalidity of the model, but disapproving the strong assumption of Rational Expectations Hypothesis (REH) since studies show that expectations cannot be fully rational and cannot account for the exogenous shocks to an economy.

SARIMAX, on the other hand, is a simple yet powerful model. It assumes that the historic values dictate behaviour of present. Forecasted inflation rates are based on auto regression, integrated and moving average concepts. It also assumes that the data does not contain anomalies, is stationary and model parameters along with error term is constant. Although SARIMAX does not take into account the stresses in market data, economical and political conditions, or correlations of all risk factors to forecast inflation rates, but the procedure demonstrated above can be useful for forecasting movements in inflation under normal conditions in which past behaviour dictates present values.

--

--