Impact
Analyzed
569 tumor samples across 30 diagnostic features
Uncovered
a black-box trust problem, not just a data problem
Designed
10 interactive controls for real-time clinical reasoning
How Might We
“...create tools that make cancer diagnostic data more interpretable, transparent, and actionable for radiologists?”
why diagnosing breast cancer is complex
Understand the challenges radiologists face when interpreting ambiguous tumor data and imaging features
how ML models can surface explainable insights
Explore how machine learning predictions can be made transparent and clinically meaningful rather than opaque
how UI design builds clinical trust
Identify how interface design can surface feature importance and reduce cognitive load in diagnostic workflows
How this was done
01
Secondary Research
Literature review mapping where and why diagnostic errors occur, and where existing AI tools fall short of clinical needs.
02
Machine Learning Model
Built a breast cancer classifier at University of Nottingham using the Wisconsin dataset, surfacing which features drove malignancy predictions.
03
Quantitative Analysis
Revisited the model through deeper data exploration — distributions, correlations, edge cases — to shape UI priorities and surface uncertainty.
04
UI/UX Prototyping
Iterated from a basic input form (V1) to a transparent, decision-supportive dashboard (V3) using Streamlit and vibe-coding.
01
Secondary Research
Literature review mapping where and why diagnostic errors occur, and where existing AI tools fall short of clinical needs.
02
Machine Learning Model
Built a breast cancer classifier at University of Nottingham using the Wisconsin dataset, surfacing which features drove malignancy predictions.
03
Quantitative Analysis
Revisited the model through deeper data exploration — distributions, correlations, edge cases — to shape UI priorities and surface uncertainty.
04
UI/UX Prototyping
Iterated from a basic input form (V1) to a transparent, decision-supportive dashboard (V3) using Streamlit and vibe-coding.
The Spark
How Do Radiologists Make Decisions When the Signs Aren't Obvious?
This project started in a machine learning course at the University of Nottingham, working with a breast cancer dataset. At first, the focus was accuracy — adjusting models, testing features, improving results.
But the deeper the analysis went, the more important the question became: what does the data hide from the people who need it most?
When someone close to me was diagnosed with breast cancer, I saw how much uncertainty surrounds the diagnostic process. It made me curious about how design and machine learning could bring more clarity and confidence to real-world diagnosis. That experience stayed with me, and later — through the University of Washington — I had the chance to take it further.
"Misdiagnosis and overdiagnosis remain key challenges in breast cancer imaging, where conventional mammography may fail to detect lesions." — Thomassin-Naggara et al. (2024)
Research question: How do radiologists make sound decisions when the underlying data — and the AI tools built on it — give them no explanation for the result?
The Challenges
A System Hidden Behind a Black Box
Despite the availability of tumor metrics and imaging data, determining whether a tumor is benign or malignant is rarely straightforward. Reading mammograms and ultrasound scans is highly nuanced and often subjective — small differences in shape, margin, or density can be hard to interpret and may lead to inconsistent assessments across clinicians, especially in borderline or low-quality cases.
1. Human error in diagnostic imaging
"If the model just says 84% malignant, what am I supposed to do with that?"
False negatives in mammography range from 12–30% depending on case complexity. Up to 50% of cancers in dense breast tissue may go undetected without additional imaging. Benign findings can be flagged as dangerous, leading to unnecessary biopsies and significant patient anxiety.
2. AI models that explain nothing
Most existing AI diagnostic tools are black boxes — they surface a probability score without explaining which features drove the result. Generic outputs like "malignant: 84%" may lack clinical value. Radiologists need to understand why a case is flagged, not just that it is.
3. Tools that don't fit clinical workflows
Existing tools often miss the realities of how radiologists actually work. They lack human-centered design: no contextual guidance, no feature-level transparency, no acknowledgment of uncertainty. The gap between model output and clinical reasoning remains wide.
The Research Process
Secondary Research
Purpose: Develop a foundational understanding of why breast cancer diagnosis is often so complex, and identify where existing tools fail clinicians.
Research questions explored:
- What are the primary sources of diagnostic error in breast cancer imaging?
- Why do clinicians distrust AI-generated predictions in diagnostic workflows?
- What design approaches have shown promise in supporting clinical decision-making?
Key findings:
Human error factors
- False negatives in mammography range from 12–30%, depending on case complexity and image quality
- Up to 50% of cancers in dense breasts may go undetected without additional imaging
- Benign findings can be flagged as dangerous, leading to unnecessary biopsies
Gaps in existing tools
- AI models are often hidden behind a black box — most don't explain why a case is flagged, limiting clinical trust
- Generic outputs like "malignant: 84%" lack clinical value and don't answer "why?"
- Tools often miss real-world workflows — a gap in human-centered design
Opportunities for innovation
- Tools that show why a tumor is flagged by surfacing the specific features influencing the decision
- Radiologists want tools that support clinical judgment, not automate it
- Outputs should adapt to case-specific contexts like dense breast tissue or borderline features
Machine Learning Model
Purpose: Explore how tumor characteristics predict malignancy, and uncover which features matter most — laying the foundation for an interface that surfaces meaningful, case-specific insights.
The classifier was built using the Wisconsin Breast Cancer Dataset (569 samples, 30 features) during coursework at the University of Nottingham. At that stage, the goal was technical: train a model, improve accuracy, understand which features drove predictions.
But building the model revealed something more important than the accuracy score — which features were most influential and in what combination. That became critical design input later. Understanding the model's internal reasoning was a prerequisite for designing an interface that could communicate it to clinicians.
Quantitative Data Exploration
Purpose: Revisit the model through deeper data analysis to understand how predictions work in detail — which features to emphasize, how uncertainty shows up, and where edge cases might cause confusion.
After committing to making this a design project, I returned to the dataset with new questions. Statistical analysis and visualization surfaced diagnostic patterns that informed every major interface decision:
| Visualization | What it revealed |
|---|---|
| Feature distributions | Which metrics most reliably separated benign from malignant |
| Correlation matrix | Feature redundancy — reducing the 30 inputs to a meaningful subset |
| Scatter comparisons | Where benign/malignant classes overlap and edge cases emerge |
| Mean vs. worst metrics | Why worst-case measurements prevent dangerous outliers from hiding in averages |
| Decision boundary maps | Where the model is confident vs. uncertain |
These visualizations shaped the design direction: what to prioritize, how to handle ambiguity, and how to build trust through clarity and transparency.
Insights & Triangulation
From Three Methods to Four Interface Decisions
Before designing the interface, I triangulated findings across secondary research, model behavior, and quantitative data analysis to create a research-to-design matrix. The matrix surfaced ten core findings that revealed where errors happen, what users need, and how AI predictions can be made more interpretable.
3 methods → 10 findings → 4 interface decisions
By combining model behavior, pattern analysis, and literature on breast cancer diagnostic processes, I mapped the most impactful pain points: unreliable feature weighting, lack of transparency, edge cases, and cognitive overload. Each design decision directly addresses these breakdowns with targeted interface responses.
Enhanced Diagnostic Precision with Mean & Worst Metrics
Radiologists don't just look at a tumor's average size — they also zero in on its single most abnormal region. Missing that one extreme spot can lead to under-diagnosis.
"Existing breast imaging studies reported the entropy, mean, minimum, and maximum as important features." — Lee et al. (2020)
Mean
Calculated by sampling a measurement (e.g. radius, area, texture) at dozens of points on the same tumor and averaging them. Reflects the lesion's typical size, shape, or heterogeneity. Without mean, radiologists lose important baseline context — the model's ability to distinguish benign from malignant depends on understanding a lesion's typical appearance.
Worst
Taken from the same measurements — the single largest (or near-largest) value. Highlights the most abnormal "hot spot" that may warrant targeted biopsy. Worst-case metrics prevent dangerous outliers from hiding in the average, ensuring that even small but aggressive regions are flagged for further clinical attention.
MVP Strategy & Design
Building in Code: Why Streamlit
I co-built the interface using Streamlit, iterating directly in code (using vibe-coding), guided by user needs and model behavior. Streamlit allowed me to maintain full control over the model logic while rapidly prototyping interfaces that stayed true to the algorithm's outputs.
Unlike visual design tools, Streamlit let me directly connect model predictions with interface elements — making it easier to test ideas in real time, adjust how probabilities were framed, surface uncertainty, and experiment with interactive features like sliders, feature-importance graphs, and confidence estimates.
V1 — Basic Inputs, No Guidance
The first version was a straightforward input form where users manually entered four tumor metrics to generate a prediction. While functional, it offered no interpretive support, making the experience feel opaque and limiting users' ability to trust or make sense of the output.
Problem identified: A single probability score ("malignant: 84%") without context is not clinically useful. Radiologists had no way to verify whether the model's reasoning aligned with their own.
V2 — Sliders and Contextual Info
The second version introduced interactive sliders, population averages, and brief metric descriptions to improve usability and reduce friction. This helped users understand what they were adjusting, but the model's reasoning was still unclear and users couldn't easily connect inputs to outcomes.
Problem identified: Better input UX wasn't enough. The gap between what users adjusted and what the model predicted remained invisible.
V3 — Transparent and Decision-Supportive
The final version focused on interpretability and trust. It added:
- Confidence labels — contextualizing the probability score with clinical language
- Similar-case comparisons — showing how this tumor's metrics compare to historical benign and malignant cases
- Feature-level visualization — a bar chart showing which metrics most influenced the result, ranked by importance
These changes transformed the tool from a black-box predictor into a decision-support interface aligned with how radiologists actually reason — and confident enough in its own outputs to explain them.
Reflections & Roadmap
Merging Code, Data, and Design
Working on this project was full of surprises. At first, weaving HTML and CSS into Python felt unfamiliar, but that challenge helped me see how code underpins every interface decision. Diving into quantitative research and creating data visualizations revealed diagnostic patterns that static mockups couldn't capture.
Turning my own ML model into an interactive diagnostic app emphasized the power of combining data and design. Watching complex tumor metrics become clear, actionable visuals for radiologists was a true highlight — proof that solid research and thoughtful coding can drive meaningful user experiences.
From Prototype to Practice
The next step is moving this prototype into real-world workflows — recruiting radiologists, medical students, and interdisciplinary care teams for hands-on usability testing. Watching how diagnostic teams navigate the interface will uncover pain points and guide refinements to how we surface feature importance and communicate uncertainty.
From there: iterate on the design based on feedback, then pilot the app in both clinical and educational settings to measure its impact on diagnostic confidence, learning outcomes, and overall workflow efficiency. These steps will ensure the tool truly integrates data-driven insights into everyday diagnostic practice.