All data is not created equal – there is some you want to keep and some you want to throw away.
As we established in a previous post, the IoT presents some issues for new participants, who often just gather the data and store it – whether they need it or not. What they may not realize is that the cost of keeping data in short-term storage is typically high. So, if you self-identify as a data hoarder, you may want to consider the alternatives.
It’s all about cost and unknown future use of data
Just as you might keep a gadget in case you need it one day, you might just need the IoT data you considered throwing away because it was too expensive to store. But, unlike with toilet roll hoarding (where you know how it will be used in most cases), with many IoT programs you often can’t foresee future uses of the data when you are starting out.
Short-term versus long-term storage
What do we mean by short-term and long-term storage of data?
- Short-term data storage in IoT scenarios tends to be the data that is collected instantly and often actioned upon immediately. For example, when a sensor detects something is too hot, it is used to immediately send an alarm to an engineer to fix it. The data is stored in an operational database within the IoT platform, for a short-, or medium-time period to ensure high performance and low storage costs.
- Long-term data storage is often in a data lake, a specially built data container for long-term storage. The data needs to be extracted from the operational database and sorted into the right format for storage and to be read by the relevant programs that need to access it. Using analytics tools to detect reliable patterns, instead of calling the engineer once it is broken, you can predict when the engineer will need to come before it breaks.
What are the costs of your data hoarding versus the value to the business?
- Short-term operational databases are often run on high-speed storage devices, which come along with high costs. The more data you have and the longer you store it there, the more your costs increase.
- The cost is far lower when the data is stored in long-term data lakes, which are designed for inexpensive storage of high data loads.
Other than cost, what else impacts your choices where you might want to store data?
It’s all about your use cases and therefore how you want to use and digest the data. How are you using the data now? How might you use the data in the future? For instant analytics, you might want the data that is two weeks old. For machine learning training, you’ll want a year of data. For the instant analytics, you want instant access. For the machine learning and business intelligence, you don’t have to access it as regularly. Due to differing usage patterns, the data is formatted in a different layout in an operational store compared to a data lake.
Newbie’s guide to IoT data storage
IoT may be new to your business and you are learning as you go along. You buy devices and sensors and first connect them to a platform. Then you can start to see the data come in. And then, once this is all up and running, you can consider how you might use this data in your business. If you are in the relatively early stages of an IoT project, here are some ideas on how companies use their data which can be stored in a less costly manner in data lakes.
Once you gain experience in IoT, that experience combined with your knowledge of your business and your customers, helps you envisage innovative uses, and to know where it needs to be stored.
So, think hard about your data. Why are you keeping it? If you know you need it in future, but are just not sure how, are you storing it in the most cost-efficient way? Have a look at data lakes and operational databases and consider your costs in each.
That future use case might be the gem of an offering in future, that you just didn’t know about. Just like the gadget you knew you wanted to keep and came into its element in those months stuck in lockdown - when you took up a new hobby.