Case Studies in Dark Data
- The mining of dark data can uncover compliance, financial and legal risks to a corporation.
- Data remediation and data governance are key factors in providing ongoing value.
- Dark data is available in multiple forms, not just legacy data. Video, audio and imaging artifacts also provide value.
- Mining dark data can provide a more holistic view of the organization's customers.
- Legacy data can also be used to improve current performance.
In this report, three case studies on the analysis and value of dark data are provided. A financial services organization that used dark data in a risk management project, a health sciences corporation using analytics to improve patient care and a manufacturing company that is repurposing their legacy data to optimize plant performance.
Case Study #1 — Risk Reduction
- A financial services organization needed better insight and control of their dark data. They had been accumulating data for years which became either irrelevant or obsolete. This data came in the form of video, disorganized and unclassified documents and other unstructured sources.
- This global company had 50,000 employees and realized this unmanaged data created significant security, legal, and operational risks. This dark data can result in regulatory sanctions, litigation or fines. A data breach could lead to the publication of private or sensitive information. The mismanagement of this data could even cause the shutdown of mission-critical systems.
- The company recognized that it wasn't enough to just find the dark data, but that the governance of that old data, along with the new data being created daily, was a crucial piece of the remediation process. Therefore, a company specializing in information governance was engaged to assist in managing their challenge.
- After a thorough analysis of the data using proprietary software, and interviewing over 250 employees in the legal, compliance, IT and security departments, the findings were published. They included:
- unsecured sensitive data,
- personally identifiable information in unsecured areas,
- improperly stored communications,
- critical production data stored with no backup,
- regulated data from an acquired entity resting in a location at risk of data loss,
- hundreds of terabytes of old obsolete data which was increasing risk and costs
- After the project was complete, there were 4.5 million gigabytes of data that were processed, over 200,000 documents were classified and over 1 million gigabytes of data were remediated. In addition, the data management policies and procedures were implemented with the support of the IT and compliance teams.
Case Study # 2 — Health Care
- Indiana University Health is examining ways to use dark and unstructured data to improve patient health care.
- One of the challenges in health care is the transactional nature of most patient's interactions with the medical establishment. This makes it difficult to provide holistic care for those seeking health care. Clinicians often do not have the information they need to make proper diagnoses.
- The digitizing of patient health records include free form notes from physicians, both written and recorded, which are generated during consultations and treatments. Data being mined includes free form notes and audio files. The same technology used to retrieve data from physicians' notes can be used to analyze patient calls to a medical helpline.
- The findings of their study of the dark data in their possession discovered that mining this data will provide a more fulsome view of the patient. It helps build patient loyalty and provide useful, cost-effective and seamless health care.
- They also found that the data in their possession could be used in new ways to help "predict need and manage care across populations."
- As a result of the new uses for dark data, IU Health is using cognitive computing, external data, and patient data to help "identify patterns of illness, health care access, and historical outcomes in local populations.
- The hospital expects that they will be able to incorporate socioeconomic factors and living conditions in a patient record that may affect patients’ engagement with health care providers.
Case Study # 3 — Manufacturing
- An enormous amount of data has been generated from mining operations and mineral processing plants. As online monitoring and in-situ instrumentation become ubiquitous through the 4th industrial revolution, the quantity and quality of data will continue to grow. The management of, and maximization of the value of, that data is a challenge facing the mining and manufacturing sectors. The integration of accurate, reliable and timely data analytics is key to enabling "data-driven plant design, optimisation and monitoring."
- An examination of the legacy data of a gold processing plant uncovered legacy operational data that contained both metallurgical and metal accounting information. The gold company realized that mining this previously unused data could "optimize plant performance beyond what is possible with conventional control systems by providing additional control dimensions, increased rate of feedback (e.g. near real-time) or more quantitative feedback control loops." The challenge was to optimize plant performance using predictive analysis and quality feedback."
- The company used the ratio between total raw gold versus recovered gold, which revealed both places in the process and work schedule shifts/times of the day when more gold was being lost. The study posited a finding that environmental factors that vary significantly for morning shifts include changes in ambient temperature, while workers' general sleepiness could also be contributing.
- While the company results were confidential in this study, the following possibilities were discussed in the summary of the report. The easiest to identify was that "existing operational data combined with data analytics produces insightful information and actionable knowledge to drive continuous improvement in the plant’s performance."
For this research on dark data, we leveraged the most reputable sources of information that were available in the public domain, including US consulting companies specializing in information management such as FTIInsights, the global research and consulting firm Deloitte and research partnerships between universities and consultant firms in the mining field including Department of Civil, Environmental and Natural Resources Engineering, Luleå University of Technology, SE-97187 Luleå, Sweden, the School of Geosciences, University of the Witwatersrand, Private Bag 3, Wits 2050, South Africa, PG Techno Wox, 43 Patrys Avenue, Helikon Park, Randfontein 1759, South Africa and Eurus Mineral Consultants (EMC), Plettenberg Bay, South Africa