Histopathologic Cancer Detection

Skills:

Histopathologic Cancer Detection

Objective

The project aims to create a deep learning model to classify histopathology images into cancerous (malignant) and non-cancerous (benign) categories. This binary classification task uses a Convolutional Neural Network (CNN) model, leveraging pre-trained architectures to expedite model performance.

Data Acquisition and Preprocessing

The dataset, obtained from Kaggle, contains over 220,000 labeled images for training and around 57,000 for testing.
An exploratory data analysis (EDA) phase checks label distribution and confirms an approximately balanced dataset (about 59.5% benign and 40.5% malignant).
Image data generators handle data augmentation and scaling, resizing images to 96x96 pixels and batching them for model input, with a batch size of 64.

Model Architecture

Initial Model: The project began with a pre-trained ResNet50 model; however, this approach yielded low accuracy and required considerable training time.
Final Model: Switching to MobileNetV2, a lighter architecture better suited for quicker training, with the model’s layers frozen to retain pre-trained features from ImageNet.
Custom layers were added on top of the base MobileNetV2 model to adapt it specifically for binary classification in this cancer detection context.
Insights and Approach Validation: By tuning the learning rate to 0.00001 and reducing training to 20 epochs, the model achieved notable improvements, with a training accuracy of 0.88 and an AUC of 0.9623. Validation accuracy increased to 0.97 with an AUC of 0.99, indicating strong generalization. On the test set, the model reached a private score of 0.9148 and a public score of 0.9448, correctly predicting 91% of unseen data. Utilizing MobileNetV2 as a base with added custom layers proved efficient, underscoring the effectiveness of transfer learning for histopathologic cancer detection. These results suggest the model’s potential reliability and applicability in diagnostic contexts.