Dark data is the term referring to all the assets and data an organisation has collected and stored but fails to use for business purposes. Unlike the dark web, there is nothing illegal or threatening in dark data, you can think of it as data that is left in the dark. A common way to describe dark data is through the use of the ROT acronym, which stands for “Redundant, Obsolete and Trivial” data. To add to that, dark data is also commonly unstructured, untagged and untapped.
In this age of big data, companies ingest all kinds of data – structured or unstructured, raw, real-time information – generated in a huge amount in a matter of seconds. Not all of this data is utilised and such “dark data” is stored in storage and repositories for a long time and sometimes stays there forever. According to IDC, up to 90% of unstructured data is never analysed. Another study by IBM found that for most companies, 80% of their data is dark data which they are not fully utilising.
Dark data can occur for a number of reasons. In many cases, organisations are not even aware that they are collecting dark data or they generate way more data than they can interpret. It is also possible that they have siloed data stores or inadequate business intelligence or analytics systems to analyse all their data.
Dark data may involve a wide range of data categories, including:
Log files.
Customer information.
Geolocation data.
Raw survey data.
Financial statements.
Surveillance video footage.
Emails.
Old documents, notes and other files.
If not utilised, companies can be left clueless about the possible insights and predictions this dark data could provide which could lead to further innovation and advancements. The most successful organisations are ones that have the right strategies and tools to audit their dark data and leverage it to generate actionable insights.