EN FR

Forecasting Gold Prices using ARIMA Model For Time Series

Introduction

Gold is one of the most popular and historical investments. Investors have always been interested in gold prices and have used multiple analysis methods to predict its price over time. In this article, we will discuss the use of the ARIMA model to forecast the price of gold.

What is ARIMA?

ARIMA stands for Auto-Regressive Integrated Moving Average. It is a method for modeling time series data for forecasting. A time series is a sequence of data points indexed in time order. Forecasting refers to predicting future data points in the series.

The theory behind ARIMA is not the purpose of this article. This chapter is the guide for this application.

Loading and Preparing the Data

For this application, we used data from Kaggle. The dataset contains 1,718 rows and 80 columns, spanning from November 18, 2011, to January 1, 2019. The relevant columns for this study are 'Date' and 'Close', representing the date and closing price, respectively.

Splitting data into training and testing sets is standard practice. In this application, we use 80% of the data for training and 20% for testing.

# Split the data into training and testing sets
# Calculate split index
split_index = int(len(data) * 0.8)
                    
# Split the DataFrame
df_train = data.iloc[:split_index]
df_test = data.iloc[split_index:]
                

Stationarity Testing

Stationarity is an important property in time series analysis. A time series is stationary if parameters such as mean and variance do not change over time.

The concept of stationarity is introduced briefly. Read more here.

To determine if our time series is stationary, we use the Augmented Dickey-Fuller (ADF) test. This test returns a p-value. If p is less than 0.05, the series is stationary; otherwise, it is not.

from statsmodels.tsa.stattools import adfuller

# Perform Augmented Dickey-Fuller test for stationarity
result = adfuller(df_train['Close'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
                

In our case, the p-value is 0.489, indicating that our time series is not stationary.

Differencing and Rechecking for Stationarity

Since the time series is not stationary, we cannot directly apply ARIMA. One way to address this is by differencing the time series.

Read more about differencing here.
# First differencing
data_diff = df_train['Close'].diff().dropna()
                

After differencing, we recheck if the new time series is stationary using the ADF test. In this case, the series became stationary after the first differencing, meaning the differencing order d = 1.

ACF and PACF Plots to Determine ARIMA Parameters

Now, we will plot ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) to determine the moving average (q) order and lag order (p), respectively.

Read about ACF and PACF here.
Partial Autocorrelation Function Autocorrelation Function

From the figures, we conclude that the lag order p = 1 and the moving average order q = 1.

Modeling with ARIMA

After determining the ARIMA parameters (p, d, q), we can proceed to train our model.

# Model parameters determined by inspecting ACF and PACF
p = 1  # lag order
d = 1  # differencing order
q = 1  # moving average order
                    
# Define the ARIMA model
model = ARIMA(data['Close'], order=(p, d, q))
                    
try:
    model_fit = model.fit()
except Exception as e:
    print(f"Model fitting failed: {e}")
    exit()
                

Forecasting and Plotting Results

Now that we have our model trained, we will forecast the testing set using this model.

# Forecast the future values
forecast_steps = len(df_test['Close'])
forecast = model_fit.forecast(steps=forecast_steps)
                

For visual interpretation, we will plot the training set, testing set, and forecasted values.

Training, testing, and forecasted values figure

Evaluating the Model's Performance

By calculating the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), we can evaluate our model. In our case, MAE = 3.9476, MSE = 21.627, and RMSE = 4.650. These errors are relatively low compared to the data values, which range between 100 and 200, suggesting that the model performs well.

In the financial context, making accurate predictions is challenging due to volatility. However, more advanced models, like SARIMA or hybrid models combining ARIMA with machine learning methods, may offer improved performance.