Sanitise your System Dumps !

Manish Kesarwani
3 min readJul 30, 2021

Enterprises capture a vast amount of data about their customers and business processes. This data provides a competitive advantage to these enterprises and is very valuable. It contains sensitive information (about the business and customers) and needs to be protected against any inadvertent leakage to anybody — internal or external to the enterprise. Various government regulations such as the Health Insurance Portability and Accountability Act (HIPAA) [1], the General Data Protection Regulation (GDPR) [2] put the onus on the enterprise to protect the customer data. Failing to do so can lead to substantial financial and regulatory penalties.

All applications, including those processing sensitive data, run on the IT infrastructure of the enterprise. Inevitably these applications will experience problems and crash. To identify the root cause of the crash, diagnostic data such as logs, traces, and memory dumps are captured .

For illustration purposes, below is a snapshot of a parsed system dump file with some custom sensitive data:

Sample System Dump

Diagnostic data such as logs and memory dumps from production systems are often shared with development teams to do root cause analysis of system crashes. Invariably such diagnostic data contains sensitive information, and sharing it can lead to data leaks.

Existing solutions [3–5] for this problem have primarily looked at extending the programming language to allow developers to specify which memory locations may contain sensitive data. During diagnostic data capture, data from these locations are not stored or is redacted. This is not enough since developers may not mark all the places where the sensitive data may be present. Additionally, it requires an extension to all the programming languages which are used for developing applications.

In our recent paper accepted in the 2021 IEEE International Conference on Cloud Computing, we present Knowledge and Learning-based Adaptable System for Sensitive InFormation Identification and Handling (KLASSIFI), which is an end to end system capable of identifying and redacting sensitive information present in diagnostic data. KLASSIFI takes a generic memory dump as input and outputs a memory dump in which all the sensitive information has been redacted in a format preserving manner.

System Architecture of KLASSIFI

The above figure shows the high-level architecture of KLASSIFI. The arrow indicates the flow of data between various components of KLASSIFI.

KLASSIFI is highly customisable, allowing it to be used for different business use cases by simply changing the configuration. KLASSIFI ensures that all the meta-information required by debuggers, such as page headers, is kept intact, ensuring that the redacted dump is valid and useful.

KLASSIFI has a built-in Knowledge Base comprising of a comprehensive suite of identifiers that can identify a large number of sensitive information types. Additionally, KLASSIFI allows a user to enhance this Knowledge Base by adding more domain and user-specific identifiers. KLASSIFI also has a feedback loop that enables a user to provide feedback about any inaccurate identifications.

More details on KLASSIFI will be published soon in the proceedings of the 2021 IEEE International Conference on Cloud Computing.

References:

[1] U. D. of Health and H. Services, “Health insurance portability and accountability act,” Accessed June 10, 2020. [Online]. Available: https://www.hhs.gov/hipaa/index.html

[2] E. Commission, “General data protection regulation,” Accessed June 10, 2020. [Online]. Available: https://gdpr.eu/

[3] R. Ding, H. Hu, W. Xu, and T. Kim, “Desensitization: Privacy-aware and attack-preserving crash report,” 2020.

[4] R. Feng, S. S. Jia, W. Lijun et al., “Protecting sensitive data in software products and in generating core dumps,” Dec. 26 2017, uS Patent 9,852,303.

[5] P. Broadwell, M. Harren, and N. Sastry, “Scrash: A system for generating secure crash information.” in Usenix Security Symposium, 2003, p. 19.

--

--