Energy consumption prediction with Auto-ARIMA

chantana chantrapornchai
3 min readNov 12, 2018

From previous article https://medium.com/@chantrapornchai/arima-for-energy-consumption-data-part-ii-ac779b40586e, we consider using auto-regression using statsmodel with ARIMA. It is difficult to figure out the p,q,d even though we study ACF, PACF. One suggestion is to perform grid search on p,d,q as in https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/. We have to iterate four loops to possible values of p,d,q which are so significantly slow, nevertheless.

One useful method is to use auto-arima by pyramid-arima https://www.alkaline-ml.com/pyramid/modules/generated/pyramid.arima.auto_arima.html#pyramid.arima.auto_arima which is very similar to R auto-arima.

From the blog, https://medium.com/@josemarcialportilla/using-python-and-auto-arima-to-forecast-seasonal-time-series-90877adff03c, https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/, we found the use of auto-arima for our case study.

Parameters for auto_arima are too many. You can look through https://www.alkaline-ml.com/pyramid/modules/generated/pyramid.arima.auto_arima.html#pyramid.arima.auto_arima for details. For our use, we call with initial parameters:

stepwise_model = auto_arima(train, start_p=2, start_q=2,
max_p=4, max_q=4, m=10,
start_P=0, seasonal=False,
d=1,max_d=1, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=False, n_jobs=4 )
stepwise_model.fit(train)

Without parameter n_job, it will run using non-parallel mode. In stepwise mode, the model should give better accuracy. However, this mode cannot run with n_job parameter due to the nature of computation. n_jobs will not affect if stepwise=True. Also, when setting seasonal=True, it requires the timing window m properly. For our data, when m is such as 24, 30, etc. unrelievedly, it runs so slow and uses too much resources so that we cannot complete the fitting. This is due to we have 7x,xxx data points totally. When we dividi to 67% for training and the remainder for testing, the training data points becomes 4x,xxx which are large and since our data is based on minute, when we want to predict 60 minute ahead m becomes 60. This is really slow. So, to only demostrate the concept, we set seasonal=False and m=10 and let it run.We have to set seasonal=True also. Note that in most of the demo data set which is yearly basis, the m=12 and the data set is kept by a monthly record.

Our case the data is only 2 month period but we have cleaned it as demonstrated in https://medium.com/@chantrapornchai/introduction-to-data-science-with-energy-data-part-1-45f49b3682e8.

After the run finishes, the final parameters are displayed by

print(stepwise_model.aic())print(stepwise_model.summary()))

The output of the iterating through parameters is in Figure 1. We also print the summary of the parameters. The model is (p=4,d=1,p=4)

Figure 1

We can save the model in pickle file.

with open(filename, 'wb') as pkl:
pickle.dump(stepwise_model, pkl)

Next time we can reload it for use and we do not spend time on training.

with open(model_file, ‘rb’) as pkl:
stepwise_model = pickle.load(pkl)

Then we predict using testing data for the next test points.

future_forecast = stepwise_model.predict(n_periods=test_size)

Figure 2 shows the RMSE test for the model selected.

Figure 2

The full code is here.

--

--

chantana chantrapornchai

I love many things about computer system such as system setup, big data & cloud tools, deep learning training, programming in many languages.