Global Medical technology Company

How HIPAA-Compliant Data Anonymization Helped a Global Medical Technology Company Accelerate Its R&D

5 min read
Industry
Healthcare
Services
AI Engineering, Product Development
Technologies
Machine Learning, Image Recognition, Natural Language Processing

the Challenge

A global medical technology company needed to unlock the value of its health data by anonymizing terabytes worth 
of medical images and clinical notes.

Our Approach

We created a multi-part machine learning (ML) system to identify and blur all personally identifiable information (PII) or protected health information (PHI) in image and text files.

The Results

  • Data anonymization integrated into new and existing solutions

  • Faster, safer data sharing with development partners

  • New data monetization opportunities

Connect with our experts
Click to meet our team
Sydnor Gammon
Partner & VP, Business Development
Sydnor drives digital innovation in healthcare and wellness, specializing in solutions that transform and enhance patient care and clinical workflows through AI, voice, and virtual technologies.
WillowTree logomark
Orchestrate winning experiences for the world's most customer-centric brands with our global teams
Let's connect

The Challenge

Unlock Medical Data Value While Safeguarding Privacy

Jump to next section

To unlock the value in its terabytes of medical images and clinical notes, one of the world’s largest global medical technology companies had to anonymize and format all of its data while maintaining HIPAA compliance. That meant identifying and anonymizing all instances of PII and PHI on every image and text file.

But identifying PII and PHI is difficult because sensitive information gets embedded in unpredictable ways (e.g., a physician’s handwritten notes about a patient, machine metadata, a half-visible hospital or manufacturer logo). Moreover, the sensitive and confidential nature of medical data prohibited using publicly available large language models (LLMs) to develop a solution.

The medical technology company needed a partner with expertise in machine learning (ML) data recognition models, optical character recognition (OCR) algorithms, and custom AI models. They chose WillowTree.

Our Approach

Custom ML Models Detect & Blur Sensitive Information in Medical Images & Text Files

Our healthcare client needed an ML system sophisticated enough to identify and protect sensitive information while preserving its medical data’s scientific value. We helped them develop a single system combining three powerful components:

  • An advanced image processing system using Tesseract OCR to detect and blur sensitive information within image files.
  • A natural language processing (NLP) engine built on the spaCy framework to identify and anonymize PHI within text files.
  • A data review and annotation system for generating future models’ training data, ensuring the ML model is ever-improving in accuracy and performance.

A human-in-the-loop mechanism deepens safeguards while also optimizing performance. The system tags each identification with a confidence score, signaling when manual review may be needed. Developer feedback then helps the system learn, another mechanism for driving better performance over time.

WillowTree AI-First Software Development

The Results

Accelerated R&D + New Sources of Revenue

By successfully developing a HIPAA-compliant data anonymization system for our client’s image and text medical files, we made it possible for them to integrate data anonymization into new and existing healthcare solutions. This also allows them to share valuable health data with research partners faster and more securely, accelerating innovation while protecting patient privacy.

Data anonymization also creates new potential revenue opportunities for our client. For instance, our client could use their anonymized data to:

  • Train new AI models for detecting diseases.
  • Apply predictive analytics for drug discovery.
  • Sell the data for similar research and development purposes.

“The future of healthcare innovation lies in how well organizations manage and use the vast quantity of data generated every day. With the right partner, medical technology companies can turn ‘byproduct’ data into innovation and commercialization opportunities, leveraging advanced data management and analytics to power research and deliver real-world impact.”

Sydnor Gammon, VP, Business Development at WillowTree
 Unlock the value in your health data
Let’s connect