Introduction to Data Science with Energy Data : Exploring relations and Prediction

chantana chantrapornchai
3 min readOct 10, 2018

Previously we have introduce how to clean the data set. The data set can be downloaded from http://datascience.ku.ac.th/16-2/.

In this step, we consider multivariate approach. Let’s try to see which power meter attributes affect the consumed active energy. First, we focus on finding correlations between these attributes and the target consumed active energy. Honestly, we don’t have knowledge about electrics much.

In statistics, we compute the covariance, where the change of X reflects the change of Y in the same direction.

The indicators that show how much relationships between two variables are called correlation coefficient. There are several kinds of correlation coefficients. The very basic one is Pearson correlation coefficient. It was invented by Karlson 120 years ago. It demonstrates the linear relationship between two variables. Secondly, Spearman correlation coefficient shows not just linear relationship but it shows the monotonic increasing association between two variables. Note that the values must rely on rank order assuming each value has a unique rank order. Thirdly, Kendall’s Tau shows only directional agreements between all pairs of X,Y but not relies on rank value.

One of the nice view is to use heat map. Figure 1 is the heat-map of Pearson method.

Figure 1: Pearson correlation for all attributes.

Since there are many attributes here, we have to find out the attributes that give much correlations. Several approaches are in the feature selection area including PCA. Let’s consider the simple one where we select the 3 top attributes that have the most correlations.

 
print (data.columns)
#print top 3features for consumed energy kW
features = getTopKCorr(data,3)
print features
Consumed_apparent_energy_kVAh 0.981747 Consumed_inductive_reactive_energy_kvarhL 0.528266 Consumed_capacitive_reactive_energy_kvarhC 0.068560

Take the three attributes to view the spearman correlation. All has the monotonic increasing direction.

Figure 2: Spearman correlation of three attributes.

Suppose we consider the above three attributes. We can create a simple linear regression prediction model. series_new1 contains only the three fields.

Figure 3: Simple prediction with 3 attributes.

That is the simple prediction methods. As noted, there are many ways to selected features. The features are used for linear regression in the same manner. Note that we have not yet remove noises from the three attributes in the previous blog: Consumed_apparent_energy_kVAh, Consumed_inductive_reactive_energy_kvarhL , Consumed_capacitive_reactive_energy_kvarhC. We only remove noises from target variable Consumed_active_energy_kW. Thus, the prediction contains some noises due to these.

Github of jupyter of both codes are

Next, we consider the time series prediction….

https://medium.com/@chantrapornchai/finding-autoregression-with-energy-data-d752d367d1c5

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

chantana chantrapornchai
chantana chantrapornchai

Written by chantana chantrapornchai

I love many things about computer system such as system setup, big data & cloud tools, deep learning training, programming in many languages.

No responses yet

Write a response