Khalida

Skills:

Histopathologic Cancer Detection

Objective

The project aims to create a deep learning model to classify histopathology images into cancerous (malignant) and non-cancerous (benign) categories. This binary classification task uses a Convolutional Neural Network (CNN) model, leveraging pre-trained architectures to expedite model performance.

Data Acquisition and Preprocessing

  • The dataset, obtained from Kaggle, contains over 220,000 labeled images for training and around 57,000 for testing.
  • An exploratory data analysis (EDA) phase checks label distribution and confirms an approximately balanced dataset (about 59.5% benign and 40.5% malignant).
  • Image data generators handle data augmentation and scaling, resizing images to 96x96 pixels and batching them for model input, with a batch size of 64.

Model Architecture

  • Initial Model: The project began with a pre-trained ResNet50 model; however, this approach yielded low accuracy and required considerable training time.
  • Final Model: Switching to MobileNetV2, a lighter architecture better suited for quicker training, with the model’s layers frozen to retain pre-trained features from ImageNet.
  • Custom layers were added on top of the base MobileNetV2 model to adapt it specifically for binary classification in this cancer detection context.
  • Insights and Approach Validation: By tuning the learning rate to 0.00001 and reducing training to 20 epochs, the model achieved notable improvements, with a training accuracy of 0.88 and an AUC of 0.9623. Validation accuracy increased to 0.97 with an AUC of 0.99, indicating strong generalization. On the test set, the model reached a private score of 0.9148 and a public score of 0.9448, correctly predicting 91% of unseen data. Utilizing MobileNetV2 as a base with added custom layers proved efficient, underscoring the effectiveness of transfer learning for histopathologic cancer detection. These results suggest the model’s potential reliability and applicability in diagnostic contexts.
plot