Detecting Brain Imaging Anomalies Using Generative AI
Post by Amanda Engstrom
The takeaway
Generative Artificial Intelligence (AI) has become a useful tool for synthesizing large brain imaging datasets and detecting pathological anomalies, but not without error. The introduction of metrics that focus on evaluating normative representation of healthy brain tissue can increase anomaly detection and diagnosis.
What's the science?
The advancement of medical imaging technologies has increased doctors’ ability to diagnose a variety of diseases. Still, it has created the challenge of how to integrate and analyze large volumes of complex imaging data. To capture the complexity and rarity of human pathologies, generative AI has been harnessed for the automated detection of pathological anomalies. Normative representation learning in the brain aims to understand the typical anatomy of the brain using large human datasets. This week in Nature Communications, Bercea and colleagues test three novel metrics that evaluate normative representation in generative AI models, focusing on understanding typical anatomical distributions in healthy individuals, and tested them against various brain pathologies.
How did they do it?
The authors propose three metrics that evaluate the quality of the pseudo-healthy restorations by generative AI models. These metrics are:
1) Restoration Quality Index (RQI) which evaluates the perceived quality of the synthesized images,
2) Anomaly to Healthy Index (AHI), measuring the closeness of the distribution of restored pathological images to a healthy reference set
3) Healthy Conservation and Anomaly Correction Index (CACI) measures how the model can both maintain the integrity of healthy regions as well as correct anomalies in pathological areas.
The authors used these metrics to evaluate current generative AI frameworks to assess the ability of each model to learn and apply normative representations to their images. Models were trained on over 500 healthy scans, and evaluated using two datasets that encompassed a wide spectrum of brain pathologies. After evaluating the performance of each model based on normative learning metrics, the authors then determined the relationship between this ranking and anomaly detection metrics.
Finally, the authors performed a clinical evaluation of their metrics with 16 radiologists. Experts were given 180 randomized images (30 pathology-free originals and 30 generated from 5 different AI models) and asked to rate each image for ‘Realness’, ‘Image Quality’, and ‘Health Status’. These evaluations helped evaluate the effectiveness of both the new metrics as well as help measure the clinical relevance of the learned representations.
What did they find?
After applying their three normative learning metrics, the authors found that each individual metric offers a unique perspective on the performance of the AI models. Methods that simply replicate input images, like autoencoders, have a high RQI but score poorly on AHI and CACI. However, models that remove anomalies such as variational autoencoders or latent transfer models, have improved CACI, but poor RQI because the output images are typically blurry. The AHI metric was the most challenging for all models. Guided restoration techniques using intelligent masking tend to achieve the highest overall scores. When all three metrics (RQI, AHI, and CACI) were collectively optimized, those AI models demonstrated enhanced predictive anomaly detection power highlighting the importance of balancing all three metrics rather than relying on one individually.
When clinically validating AI-generated images with radiologists, there was no significant difference between AI-generated and real images. Even real non-pathological images showed variability in scoring, particularly in health score. Real images scored only marginally higher in ‘Realness’ scores. Models such as AutoDDPM and the RA method, both scored within the top 5 for normative learning and, received scores similar to the real images in ‘Realness’ and ‘Health’ respectively. The comprehensive clinical validation concluded that the proposed RQI, and to a lesser extent the AHI (CACI could not be evaluated in this study design), correlated well with clinical assessments.
What's the impact?
This study found that generative AI models that score highly in normative learning metrics can more aptly detect diverse brain pathologies and are more proficient at anomaly detection. These metrics provide a framework for evaluating AI models with greater clinical relevance. Advanced AI medical imaging is an advantageous diagnostic tool to assist clinicians in increasing workflow efficiency, diagnostic accuracy and ultimately improving patient care.