Introduction to Data Science with Energy Data : Exploring relations and Prediction
Previously we have introduce how to clean the data set. The data set can be downloaded from http://datascience.ku.ac.th/16-2/.
In this step, we consider multivariate approach. Let’s try to see which power meter attributes affect the consumed active energy. First, we focus on finding correlations between these attributes and the target consumed active energy. Honestly, we don’t have knowledge about electrics much.
In statistics, we compute the covariance, where the change of X reflects the change of Y in the same direction.

The indicators that show how much relationships between two variables are called correlation coefficient. There are several kinds of correlation coefficients. The very basic one is Pearson correlation coefficient. It was invented by Karlson 120 years ago. It demonstrates the linear relationship between two variables. Secondly, Spearman correlation coefficient shows not just linear relationship but it shows the monotonic increasing association between two variables. Note that the values must rely on rank order assuming each value has a unique rank order. Thirdly, Kendall’s Tau shows only directional agreements between all pairs of X,Y but not relies on rank value.
One of the nice view is to use heat map. Figure 1 is the heat-map of Pearson method.

Since there are many attributes here, we have to find out the attributes that give much correlations. Several approaches are in the feature selection area including PCA. Let’s consider the simple one where we select the 3 top attributes that have the most correlations.
print (data.columns)#print top 3features for consumed energy kW
features = getTopKCorr(data,3)
print features
Consumed_apparent_energy_kVAh 0.981747 Consumed_inductive_reactive_energy_kvarhL 0.528266 Consumed_capacitive_reactive_energy_kvarhC 0.068560…
Take the three attributes to view the spearman correlation. All has the monotonic increasing direction.

Suppose we consider the above three attributes. We can create a simple linear regression prediction model. series_new1 contains only the three fields.

That is the simple prediction methods. As noted, there are many ways to selected features. The features are used for linear regression in the same manner. Note that we have not yet remove noises from the three attributes in the previous blog: Consumed_apparent_energy_kVAh, Consumed_inductive_reactive_energy_kvarhL , Consumed_capacitive_reactive_energy_kvarhC. We only remove noises from target variable Consumed_active_energy_kW. Thus, the prediction contains some noises due to these.
Github of jupyter of both codes are
Next, we consider the time series prediction….
https://medium.com/@chantrapornchai/finding-autoregression-with-energy-data-d752d367d1c5