There’s huge amount of interest and lots of money in big data. But how do we use it with integrity and purpose?
At Quby, we have a team of data scientists working on the most advanced analytics for energy insights. We use data from a large number of households that opt to share their information for research and development purposes. For each household, we collect over 140 different variables with high granularity. This puts us in a unique position to apply the latest technology to improve our products and software.
Below are three ways that we’re making sense of data at Quby.
Python framework for training algorithms
There are a variety of real-time algorithms running on Toon, our smart thermostat and energy display. These handle heating control, temperature compensation, and the alignment of signals. The process we use to develop our algorithms always begins with data collection. This data comes from Toons in the field, controlled environments, or is provided by external partners. We have created a framework in Python that allows us to import a wide range of data sources, train algorithms, and visualize algorithmic performance. Our internal libraries can be used across projects to automate the algorithm development process.
For example, one of the most important functions of Toon is accurately sensing and maintaining temperature in a room. To achieve this, we have developed a temperature compensation algorithm that offsets internal heating within the device to report an accurate temperature measurement to our customer. This compensated temperature is also used by the heating control algorithm to maintain a desired room temperature setpoint. The data used for developing the compensation algorithm includes raw temperature sensor measurements, in addition to reference sensor data.
With so many data sources, one of our major hurdles is in importing new data formats into our Python algorithm training framework. We have solved this by creating a library that allows us to work with unwieldy formats including the complex Microsoft Excel workbooks sometimes provided by our partners. All of these data formats are imported into the same dataframe format that we then annotate with functional events on the device to gain insight into which functions the internal temperature is most sensitive to.
We are now able to automate our algorithm training. When our data represents various situations (i.e. extreme ambient temperatures, software upgrades, etc.), this automated training acts as a sort of regression test, to ensure that the algorithm is sufficient for all use cases on the display. And as our database grows, our algorithms are further optimized. We have future ambitions to be able to toggle certain datasets on and off for training and to provide warnings about specific datasets.
Using k-d trees for finding similar people
At Quby we like to empower people by letting them know about their energy consumption patterns which allows them to reduce their bills. To do this, Toon offers what we call a ‘Benchmark’ functionality.
Users who subscribe to Benchmark, are compared to people that they are “similar” to. These users receive daily reports from that comparison giving them a rating from “A” to “E”, where “A” means the user is consuming energy efficiently and “E” means the opposite.
But, what do we mean by “similar”?
Benchmark compares users based on various characteristics and attributes such as their house type or when the house was built.
This system works very well, but we want to improve the functionality to give the best advice to consumers about where energy savings can be made. This is why we’ve built a proof of concept using a different comparison algorithm. We tried various clustering algorithms for this comparison, but in the end we found that we got the best results from k-nearest neighbours.
K-nearest neighbours assumes that the data is in a feature space. For instance, the data could be a multidimensional vector with a label showing its class. This allows the notion of distance to be computed and that distance then represents the “similarity” of two nodes.
However, the K-nearest neighbours algorithm uses the entire data space for finding the closest class or neighbor. This means that for data that is large or has high dimensionality, the algorithm is computationally expensive. For that reason, we decided to implement k-nearest neighbors with a KD-tree data structure. In short, a KD-tree or k-dimensional tree, is a data structure that partitions the space around points in k-dimensional space.
In short, by using Quby’s in-house data science knowledge we were able to identify a superior algorithm that will improve the Benchmark feature in the future.
Energy Insights through big data analytics
Energy consumption is still a black box for most consumers, they find it complicated and uninteresting to engage with, and most of the information available in the public domain on this topic is too generic to be useful. Still, a large majority of consumers would like to know if they could do better in terms of avoiding wasting energy or saving money.
Toon collects electricity and gas data at high granularity from a large number of existing users who opt-in to share their data for research purposes and the big-data team at Quby uses this data to develop advanced analytics algorithms. These algorithms are mainly used in the field of boiler management, energy efficiency and demand side management.
Quby’s big-data platform consists of a Hadoop cluster with models using Spark and Python.
A typical example of the analytics platform would be to identify the energy usage of various household appliances. For example, by analysing the data from a given household our big data platform can tell you not only how much your TV’s standby mode is costing you, but also very specific information such as if you’re freezer needs to be repaired! Using these technologies allows us to make energy accessible and easy to understand, and helps consumers to avoid wasting energy.
We’ve seen three examples of how we’re making sense of data at Quby. As the data volume and complexity keeps increasing, we’re constantly working on improving our tools and methods. We’re convinced that data is the key to the future of smart energy and connected devices and we’re extremely excited to be pushing the boundaries in this field.
Are you looking for an opportunity to build cutting edge software with huge datasets for a product that consumers love? Check out the Quby careers page now.
- November 21, 2016