Introduction to Data Science with Energy Data : Exploring relations and Prediction

3 min readOct 10, 2018

Previously we have introduce how to clean the data set. The data set can be downloaded from http://datascience.ku.ac.th/16-2/.

In this step, we consider multivariate approach. Let’s try to see which power meter attributes affect the consumed active energy. First, we focus on finding correlations between these attributes and the target consumed active energy. Honestly, we don’t have knowledge about electrics much.

In statistics, we compute the covariance, where the change of X reflects the change of Y in the same direction.

The indicators that show how much relationships between two variables are called correlation coefficient. There are several kinds of correlation coefficients. The very basic one is Pearson correlation coefficient. It was invented by Karlson 120 years ago. It demonstrates the linear relationship between two variables. Secondly, Spearman correlation coefficient shows not just linear relationship but it shows the monotonic increasing association between two variables. Note that the values must rely on rank order assuming each value has a unique rank order. Thirdly, Kendall’s Tau shows only directional agreements between all pairs of X,Y but not relies on rank value.

One of the nice view is to use heat map. Figure 1 is the heat-map of Pearson method.

Figure 1: Pearson correlation for all attributes.

Since there are many attributes here, we have to find out the attributes that give much correlations. Several approaches are in the feature selection area including PCA. Let’s consider the simple one where we select the 3 top attributes that have the most correlations.

 
print (data.columns)#print top 3features for consumed energy kW
features = getTopKCorr(data,3)
print features

Consumed_apparent_energy_kVAh 0.981747 Consumed_inductive_reactive_energy_kvarhL 0.528266 Consumed_capacitive_reactive_energy_kvarhC 0.068560…

Take the three attributes to view the spearman correlation. All has the monotonic increasing direction.

Figure 2: Spearman correlation of three attributes.

Suppose we consider the above three attributes. We can create a simple linear regression prediction model. series_new1 contains only the three fields.

Figure 3: Simple prediction with 3 attributes.

That is the simple prediction methods. As noted, there are many ways to selected features. The features are used for linear regression in the same manner. Note that we have not yet remove noises from the three attributes in the previous blog: Consumed_apparent_energy_kVAh, Consumed_inductive_reactive_energy_kvarhL , Consumed_capacitive_reactive_energy_kvarhC. We only remove noises from target variable Consumed_active_energy_kW. Thus, the prediction contains some noises due to these.

Github of jupyter of both codes are

cchantra/datascience_tutorial

Contribute to cchantra/datascience_tutorial development by creating an account on GitHub.

github.com

Next, we consider the time series prediction….

https://medium.com/@chantrapornchai/finding-autoregression-with-energy-data-d752d367d1c5

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Data Science

Data Visualization

Written by chantana chantrapornchai

50 Followers

2 Following

I love many things about computer system such as system setup, big data & cloud tools, deep learning training, programming in many languages.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from chantana chantrapornchai

3D Pose Estimation with Media Pipe and OpenPose

chantana chantrapornchai

3D Pose Estimation with Media Pipe and OpenPose

Pose estimation has been around for a while and there are many applications on it. We are wondering around the ML Toolkit as we have done…

Dec 23, 2021

Face Analysis using ML-Kit and TensorFlow Lite

chantana chantrapornchai

Face Analysis using ML-Kit and TensorFlow Lite

I am wandering around and try to find a solution to develop face recognition project on Android. My goal is to run facial expression…

Jul 22, 2020

Creating own name entity recognition using BERT and SpaCy: Tourism data set

Analytics Vidhya

chantana chantrapornchai

Creating own name entity recognition using BERT and SpaCy: Tourism data set

Since we are interested ontology data extraction for tourism data set, we try to find the way to insert data to the ontology automatically…

Jan 3, 2020

chantana chantrapornchai

Tips in Setting ROS Networking

I’d like to make a note for setting ROS laptop to communicate with any ROS node master. My case is on Raspberry pi. A good starting…

Aug 19, 2021

See all from chantana chantrapornchai

Recommended from Medium

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

Sentiment Analysis of Online Reviews with Different Lexicons using R

Marketing Data Science with Joe Domaleski

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

Oct 6, 2024

Lists

Predictive Modeling w/ Python

20 stories1857 saves

ChatGPT prompts

51 stories2643 saves

Coding & Development

11 stories1033 saves

Practical Guides to Machine Learning

10 stories2225 saves

Pipeline: Your Data Engineering Resource

Zach Quinn

Creating The Dashboard That Got Me A Data Analyst Job Offer

A walkthrough of the Udemy dashboard that got me a job offer from one of the biggest names in academic publishing.

Dec 5, 2022

2.3K

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

1.4K

15 AI Agent Business Ideas to Get Rich in 2025

Everyday AI

Manpreet Singh

15 AI Agent Business Ideas to Get Rich in 2025

Feb 6

1.5K

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

DataDrivenInvestor

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 15, 2024

9.1K

242

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams