Have you ever considered the fact that different types of big data might also determine the way you process the data workload, e.g. right now, soon or later?
In order to understand these differences, I would like to introduce the term of “big data temperature”, i.e. determine the right temperature zone where to process big data best.
Cold Big Data – Cold big data is data that you normally receive on an hourly, daily or weekly base. It does not really matter how fast the data is being delivered. It might be caused by the fact that you data delivery site is not able to transfer the data more immediately or caused by the fact that the receiving technology (e.g. Hadoop HDFS) likes data to be more delivered in data chunks rather than as data sand grain. Systems such as Hadoop or noSQL databases are good candidates to store even ice-cold data (i.e. you might even not know if you are going to use it later).
Warm Big Data – Whenever you need to process big data on demand, but not necessarily instantaneously, this is a right temperature zone for your data. Warm big data can be stored best in-memory data grids (e.g. BigMemory) or in-memory empowered databases to make it fast accessible on demand. Whenever warm big data needs to be retained for a longer time (i.e. it is getting colder) it is recommend to let the data flow into the cold big data zone.
Hot Big Data – Hot big data is data that needs to be analyzed and accesses as close to real-time as possible in order to make instant decisions during the time when you receive the data. This is critical when you process data-in-motion sources, e.g. sensor or location data. Technologies such as Complex Event Processing (e.g. Apama) are tailored towards that purpose. In case you need to retain the data for subsequent access or you want to apply identified patterns of historical data it is recommended to move data to warm or cold.
The idea behind big data temperature zones should not be seen in isolation, but as a flowing data concept that is always supported by the best of breed technology – assuming that for the near future we will not have one technology that covers all zones.