diff --git a/recognition/yolo-47450253/.gitignore b/recognition/yolo-47450253/.gitignore new file mode 100644 index 000000000..a5802f9c5 --- /dev/null +++ b/recognition/yolo-47450253/.gitignore @@ -0,0 +1,5 @@ +data-ISC2018/ +data/ +models/ +runs/ +yolo11n.pt \ No newline at end of file diff --git a/recognition/yolo-47450253/README.md b/recognition/yolo-47450253/README.md new file mode 100644 index 000000000..7c0ce817c --- /dev/null +++ b/recognition/yolo-47450253/README.md @@ -0,0 +1,148 @@ +# Detecting Skin Lesions using YOLO11 +Since 2016, the International Skin Imaging Collaboration (ISIC) have run annual challenges with the aim of developing imaging tools to aid in the automated diagnosis of melanoma from dermoscopic images. This repository focuses on detecting skin lesions within dermoscopic images. This data could then be taken and used in furthor diagnostic tools and is a useful first step in achieving this. As specified by the task requirements, this repository makes use of YOLOv11 to perform image detection on the provided dataset by the ISIC. + +## Dependencies +- PyTorch 2.5.1+cu124 +- Torchvision 0.20.1+cu124 +- Ultralytics 8.3.24 +- Opencv-python 4.10.0.84 + +## About YOLO11 +YOLOv11 is the 11th major iteration in the "You Only Look Once" family of object detection models and is a very recent development, having released in September 2024. YOLOv11 provides significant increases in accuracy and small object detection in comparison to previous versions, while still maintaining high efficiency and speed. The YOLO family of models are very flexible and can easily be trained to detect objects on a variety of custom datasets with little to occasionally no fine tuning needed provided that the dataset and labels are provided to it in the necessary formats. + +### Model Architecture +YOLOv11 provides significant advancements over older versions of YOLO, with various improvements to the components that make up its architecture, but ultimately follows a very similar structure to its predecessors. + +![Architecture Diagram of YOLOv11](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/YOLOv11Architecture.png) +*Correction: the SPFF block in this diagram should be refered to as SPPF* + +The Architecture can be broken down into a backbone (the primary feature extractor), neck (Intermediate Processing) and head (Prediction), stages of each segment being comprised of the following blocks: + +#### Convolutions (Conv) +A basic convolution operation consisting of a Conv2d, BatchNorm2d, and SiLU activation function + +#### Cross Stage Partial - with kernal size of 2 (C3K2) +A computationally efficient block that can switch between a more standard implementation of CSP or a mode which can extract more complex features. This block is used frequently throughout the model's architecture to aggregate, process and refine features. This block splits feature maps in two and only running one part through its layers, while the other skips through and is then concatenated. + +#### Spacial Pyramid Pooling - Fast (SPPF) +This component allows the model to divide the image into a grid and then pool the features of each cell indepentently, allowing the model to work on different image resolutions. SPPF is a faster version of typical Spacial Pyramid Pooling that trades accuracy for speed. + +#### Attention Mechanism (C2PSA) +The biggest change between YOLOv11 and its last major predecessor YOLOv8, this block allows the model to focus on important portions of the image, improving detection of small or obscured objects. This likely provides significant improvements for this task specifically due to potential occlusion of skin lesions by body hair. + +### Visualisations By Block Type +| | | +|---|---| +| Start: Stage 0 - Conv Features | Pooling: Stage 9 - SPPF Features| +|![Conv Features](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/stage0_Conv_features.png)|![SPPF Features](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/stage9_SPPF_features.png)| +|Attention Mechanism: Stage 10 - C2PSA Features| Final: Stage 22 - C3K2 Features| +|![C2PSA Features](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/stage10_C2PSA_features.png)|![C3K2 Features](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/stage22_C3k2_features.png)| + +## About the ISIC2018 Dataset +### Dataset Breakdown +The [ISIC 2018 Task 1 Dataset](https://challenge.isic-archive.com/data/#2018) is comprised of 3694 dermoscopic full colour images broken down into three categories: +- 2594 Training Images +- 100 Validation Images +- 1000 Testing Images + +Each of these categories is also accompanied by the same number of black and white Ground Truth masks. + +|Dermoscopic Image|Ground Truth| +|---|---| +|![Dermoscopic Example](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/ISIC_0000000.jpg)|![Ground Truth Example](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/ISIC_0000000_segmentation.png)| + +### Using the Dataset + +To use the ISIC 2018 dataset with YOLOv11, preprocessing must be performed and the data must be arranged in the correct locations. To setup the dataset correctly follow the following steps. +1. Ensure that the variables *ISC2018_TRUTH_TRAIN*, *ISC2018_TRUTH_VALIDATE* and *ISC2018_TRUTH_TEST* point to the location where the appropriate Ground Truth files are stored. +2. Run dataset.py. This should create the required file structure for the YOLOv11 model and convert the ground truth images into labels formatted for use in YOLOv11. + +Running dataset.py will produce the following file structure if it does not yet exist: +``` +data +├── images +│ ├── test +│ ├── train +│ └── validate +└── labels + ├── test + ├── train + └── validate +``` +3. Copy the dermoscopic images into their respective test, train and validate folders. + + +## Training the Model +Once the dataset is correctly structured, the model is then able to be trained. The model was trained with hyperparameters fairly similar to YOLOv11's default values as a starting point, with plans to fine tune these values if needed: +- Epochs: 75 +- Learning Rate: 0.01 +- Momentum: 0.937 +- Weight Decay: 0.0005 +- Batch Size: -1 (This sets the batch size such that it consumes ~60% of total VRAM on the provided GPU, in this case 3.66GB on a RTX2060) + +The model can be trained by running train.py. + +### Training Results +The model took approximately 40 minutes to train and seemed to produce good looking results upon comparing a validation batch vs its labels. + +| | | +|---|---| +|![Validation Batch Labels](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/val_batch1_labels.jpg)|![Validation Batch Predictions](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/val_batch1_pred.jpg)| + +![Training Results](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/training_results.png) +The graphs refer to the following relevant metrics: +- box_loss: Box Loss - Error in the predicted boxes compared to labels +- cls_loss: Class Loss - Error in prediction of classes +- Precision: Proportion of True positive predictions vs All predicted positives, measures how prone the model is to false positives. +- Recall: Proportion of True positive predictions vs All true positives, measures the models ability to predict lesions. + + +Inspection of the final metrics on the validation sets after training completion showed the following. + +|Box Loss|Class Loss|Precision|Recall| +|---|---|---|---| +|1.22463|0.45896|0.94926|0.96| + +This was a significantly better result than expected given that the selected hyperparameters were intended to be a starting point that may need further tuning. As such it was decided just to leave these values as is due to the computational intensity of YOLO's tuning genetic algorithm and the strong performance displayed by the validation of training data not warranting the time required to manually tune these values. + +### Tuning the Model +If you wish to do so, while for this run the model was not tuned further, you can get access to optimized hyperparameters by running tune.py. This file will run YOLOv11s hyperparameter optimizer that makes use of a genetic algorithm to determine the best hyperparameters for a given dataset. Be aware that even on powerful hardware such as the Nvidia A100, this process can take upwards of 6 hours even for small datasets. + +### Evaluating Against the Test Set +It was also decided to validate the model against the set of test data, given that the dataset provided by ISIC contained ground truthes for these images as well. This was performed with an IOU threshold of 0.8 and confidence threshold of 0.5 as the predictions would also require these thresholds as specified by the task. This validation resulted in a precision of 0.965 and Recall of 0.939 for the model. +![Validations results of the modal against Test set](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/testvalidate.png) + +If you wish to run this for yourself, simply run evaluate.py. + +## Predicting Lesions +Once validation was complete and it was ensured that the model was performing well, it was then used to detect lesions in the test dataset. + +Here is some predictions and their true labels to compare to: + +|Reality|Prediction| +|---|---| +|![Reality](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/test_batch2_labels.jpg)|![Model's Prediction](https://github.com/LonelyNo/PatternAnalysis-2024/blob/topic-recognition/recognition/yolo-47450253/images/test_batch2_pred.jpg)| + +To perform the prediction on the test set, simply run predict.py + +## Reproduction + +To reproduce the results seen above: +1. Follow the instructions in the *Using the Dataset* section above. +2. Run train.py and wait for training to complete. +3. Run predict.py to generate predictions on the test set. +4. (Optional) Run evaluate.py to generate the exact array of images seen in *Predicting Lesions* and view other metrics regarding the prediction process. + +## Conclusion and Improvements +Overall, while model was able to successfully detect skin lesions within the ISC2018 test set to the required parameters of the task. While the selected hyperparameters were able to train the model such that it solved the task sufficiently, taking the time to run the tuning script to generate more optimal parameters would likely result in a model that is even better at the detection task, or at the bare minimum the ability to train the model in a lower number of epochs, thus speeding up training time. + +## A note on the lack of modules.py +As mentioned in Ed Post #336, using ultralytics to perform this task is an allowed method and as mentioned in Ed Post #444, this means that all of the functionality that would be stored in modules.py is already provided by the pretrained model itself. As anything that would be added to this would essentially just be a simple wrapper for YOLO's methods I elected to not include this file. + +## References +- [ISIC Challenge Website](https://challenge.isic-archive.com) +- [ISIC 2018 Task 1 Dataset](https://challenge.isic-archive.com/data/#2018) +- [YOLOv11 Architecture Explained: Next-Level Object Detection with Enhanced Speed and Accuracy - By S Nikhileswara Rao](https://medium.com/@nikhil-rao-20/yolov11-explained-next-level-object-detection-with-enhanced-speed-and-accuracy-2dbe2d376f71) +- [YOLOv11: An Overview of the Key Architectural Enhancements - By Rahima Khanam and Muhammad Hussain](https://arxiv.org/html/2410.17725v1) +- [YOLOv8 Architecture Explained! - By Abin Timilsina](https://abintimilsina.medium.com/yolov8-architecture-explained-a5e90a560ce5) +- [Ultralytics Docs: Performance Metrics Deep Dive](https://docs.ultralytics.com/guides/yolo-performance-metrics/#object-detection-metrics) \ No newline at end of file diff --git a/recognition/yolo-47450253/data.yml b/recognition/yolo-47450253/data.yml new file mode 100644 index 000000000..b90142e88 --- /dev/null +++ b/recognition/yolo-47450253/data.yml @@ -0,0 +1,7 @@ +path: ../data +train: images/train +val: images/validate +test: images/test + +names: + 0: lesion \ No newline at end of file diff --git a/recognition/yolo-47450253/dataset.py b/recognition/yolo-47450253/dataset.py new file mode 100644 index 000000000..6c95b9737 --- /dev/null +++ b/recognition/yolo-47450253/dataset.py @@ -0,0 +1,75 @@ +import os +import cv2 as cv + +# Modify these to the path of your ground truths. +ISC2018_TRUTH_TRAIN = "./data-ISC2018/ISIC2018_Task1_Training_GroundTruth_x2" +ISC2018_TRUTH_VALIDATE = "./data-ISC2018/ISIC2018_Task1_Validation_GroundTruth" +ISC2018_TRUTH_TEST = "./data-ISC2018/ISIC2018_Task1_Test_GroundTruth" + + +OUTPUT_TRAIN = "./data/labels/train" +OUTPUT_VALIDATE = "./data/labels/validate" +OUTPUT_TEST = "./data/labels/test" +IMAGE_TRAIN = "./data/images/train" +IMAGE_VALIDATE = "./data/images/validate" +IMAGE_TEST = "./data/images/test" +CLASS_NO = "0" #Only testing for 1 class: skin lesions. + +""" +create_label(mask) + Finds the contours from a given mask, uses those contours to find the bounding Rectangle + and then returns this rectangle in YOLO format + mask: (numpy array) an array containing greyscale image data. +""" +def create_label(mask): + mask_height, mask_width = mask.shape + contours, _ = cv.findContours(mask, mode = cv.RETR_TREE, method=cv.CHAIN_APPROX_SIMPLE) + if not contours: + return None + x, y, w, h = cv.boundingRect(contours[0]) + + #Normalize x, y, w, h because cv2 uses a top left format to store coordinates for bountingRect, yolo stores its coordinates in center format + #Format details can be found here: https://docs.ultralytics.com/datasets/detect/#ultralytics-yolo-format + x_n = (x + w/2) / mask_width + y_n = (y + h/2) / mask_height + w_n = w / mask_width + h_n = h / mask_height + label = f"{CLASS_NO} {x_n:.6f} {y_n:.6f} {w_n:.6f} {h_n:.6f}" + return label + +""" +mask2label(input_path, output_path) + Converts all ground truth masks within a folder to YOLO format labels + and then returns this rectangle in YOLO format + input_path: input location + output_path: output location +""" +def mask2label(input_path, output_path): + for file in os.listdir(input_path): + if file.endswith(".png"): + mask_path = os.path.join(input_path, file) + label_path = os.path.join(output_path, file.replace("_segmentation.png", ".txt")) + + mask = cv.imread(mask_path, cv.IMREAD_GRAYSCALE) + if mask is None: + raise FileNotFoundError("Not Found: " + mask_path) + label = create_label(mask) + if label: + with open(label_path, "w") as f: + f.write(label) + else: + print(file + " is missing contours.") + + +#Make directories needed for processing if they do not already exist +os.makedirs(OUTPUT_TRAIN, exist_ok=True) +os.makedirs(OUTPUT_VALIDATE, exist_ok=True) +os.makedirs(OUTPUT_TEST, exist_ok=True) +os.makedirs(IMAGE_TRAIN, exist_ok=True) +os.makedirs(IMAGE_VALIDATE, exist_ok=True) +os.makedirs(IMAGE_TEST, exist_ok=True) + +#Convert Truths to Labels +mask2label(ISC2018_TRUTH_TRAIN, OUTPUT_TRAIN) +mask2label(ISC2018_TRUTH_VALIDATE, OUTPUT_VALIDATE) +mask2label(ISC2018_TRUTH_TEST, OUTPUT_TEST) diff --git a/recognition/yolo-47450253/evaluate.py b/recognition/yolo-47450253/evaluate.py new file mode 100644 index 000000000..d2c6612a2 --- /dev/null +++ b/recognition/yolo-47450253/evaluate.py @@ -0,0 +1,18 @@ +import torch +from ultralytics import YOLO + +DATA_YML = "./data.yml" +TRAINED_WEIGHTS = "./models/train2/weights/best.pt" + +''' +evaluate() + Evaluates the model in TRAINED_WEIGHTS against the test dataset referenced by DATA_YML. +''' +def evaluate(): + device = 'cuda' if torch.cuda.is_available() else 'cpu' #default to gpu + model = YOLO(TRAINED_WEIGHTS).to(device) + + validate = model.val(data=DATA_YML, split='test', conf=0.5, iou=0.8,) + +if __name__ == '__main__': + evaluate() \ No newline at end of file diff --git a/recognition/yolo-47450253/images/ISIC_0000000.jpg b/recognition/yolo-47450253/images/ISIC_0000000.jpg new file mode 100644 index 000000000..0f8e21eb2 Binary files /dev/null and b/recognition/yolo-47450253/images/ISIC_0000000.jpg differ diff --git a/recognition/yolo-47450253/images/ISIC_0000000_segmentation.png b/recognition/yolo-47450253/images/ISIC_0000000_segmentation.png new file mode 100644 index 000000000..caa4c0a4d Binary files /dev/null and b/recognition/yolo-47450253/images/ISIC_0000000_segmentation.png differ diff --git a/recognition/yolo-47450253/images/YOLOv11Architecture.png b/recognition/yolo-47450253/images/YOLOv11Architecture.png new file mode 100644 index 000000000..8321e8d2a Binary files /dev/null and b/recognition/yolo-47450253/images/YOLOv11Architecture.png differ diff --git a/recognition/yolo-47450253/images/stage0_Conv_features.png b/recognition/yolo-47450253/images/stage0_Conv_features.png new file mode 100644 index 000000000..9b2f2d47c Binary files /dev/null and b/recognition/yolo-47450253/images/stage0_Conv_features.png differ diff --git a/recognition/yolo-47450253/images/stage10_C2PSA_features.png b/recognition/yolo-47450253/images/stage10_C2PSA_features.png new file mode 100644 index 000000000..7cc997e1a Binary files /dev/null and b/recognition/yolo-47450253/images/stage10_C2PSA_features.png differ diff --git a/recognition/yolo-47450253/images/stage22_C3k2_features.png b/recognition/yolo-47450253/images/stage22_C3k2_features.png new file mode 100644 index 000000000..886139a0a Binary files /dev/null and b/recognition/yolo-47450253/images/stage22_C3k2_features.png differ diff --git a/recognition/yolo-47450253/images/stage9_SPPF_features.png b/recognition/yolo-47450253/images/stage9_SPPF_features.png new file mode 100644 index 000000000..f1ab69dd7 Binary files /dev/null and b/recognition/yolo-47450253/images/stage9_SPPF_features.png differ diff --git a/recognition/yolo-47450253/images/test_batch2_labels.jpg b/recognition/yolo-47450253/images/test_batch2_labels.jpg new file mode 100644 index 000000000..ea17ab639 Binary files /dev/null and b/recognition/yolo-47450253/images/test_batch2_labels.jpg differ diff --git a/recognition/yolo-47450253/images/test_batch2_pred.jpg b/recognition/yolo-47450253/images/test_batch2_pred.jpg new file mode 100644 index 000000000..eea6119b7 Binary files /dev/null and b/recognition/yolo-47450253/images/test_batch2_pred.jpg differ diff --git a/recognition/yolo-47450253/images/testvalidate.png b/recognition/yolo-47450253/images/testvalidate.png new file mode 100644 index 000000000..51bde8ffa Binary files /dev/null and b/recognition/yolo-47450253/images/testvalidate.png differ diff --git a/recognition/yolo-47450253/images/training_results.png b/recognition/yolo-47450253/images/training_results.png new file mode 100644 index 000000000..5f94098f0 Binary files /dev/null and b/recognition/yolo-47450253/images/training_results.png differ diff --git a/recognition/yolo-47450253/images/val_batch1_labels.jpg b/recognition/yolo-47450253/images/val_batch1_labels.jpg new file mode 100644 index 000000000..8d343aa53 Binary files /dev/null and b/recognition/yolo-47450253/images/val_batch1_labels.jpg differ diff --git a/recognition/yolo-47450253/images/val_batch1_pred.jpg b/recognition/yolo-47450253/images/val_batch1_pred.jpg new file mode 100644 index 000000000..e883188ce Binary files /dev/null and b/recognition/yolo-47450253/images/val_batch1_pred.jpg differ diff --git a/recognition/yolo-47450253/predict.py b/recognition/yolo-47450253/predict.py new file mode 100644 index 000000000..4268f165f --- /dev/null +++ b/recognition/yolo-47450253/predict.py @@ -0,0 +1,23 @@ +import torch +from ultralytics import YOLO + +TRAINED_WEIGHTS = "./models/train2/weights/best.pt" +TEST_SET = "./data/images/test" + +''' +predict() + Performs predictions on the test image set (TEST_SET) using a given trained model (TRAINED_WEIGHTS). Saves images to runs/detect/predictN +''' +def predict(): + device = 'cuda' if torch.cuda.is_available() else 'cpu' #default to gpu + model = YOLO(TRAINED_WEIGHTS).to(device) + + results = model.predict(source=TEST_SET, + imgsz=640, + conf=0.5, + iou=0.8, + save=False) + + +if __name__ == '__main__': + predict() \ No newline at end of file diff --git a/recognition/yolo-47450253/train.py b/recognition/yolo-47450253/train.py new file mode 100644 index 000000000..ae50c2fa8 --- /dev/null +++ b/recognition/yolo-47450253/train.py @@ -0,0 +1,35 @@ +import torch +from ultralytics import YOLO + + +DATA_YML = "./data.yml" +OUTPUT = "./models" + +SETTINGS = { + 'data': DATA_YML, + 'epochs': 75, + 'imgsz': 640, + 'workers': 5, + 'batch': -1, + 'lr0': 0.01, + 'optimizer': "AdamW", + 'momentum': 0.937, + 'weight_decay': 0.0005, + 'project': OUTPUT, + 'save_period': 10 + } + +''' +train() + Trains a model using the given dataset (DATA_YML) and saves it to OUTPUT + Hyperperameters can be changed in the settings dictionary +''' +def train(): + device = 'cuda' if torch.cuda.is_available() else 'cpu' #default to gpu + model = YOLO("yolo11n.pt").to(device) + + results = model.train(**SETTINGS) + + +if __name__ == '__main__': + train() \ No newline at end of file diff --git a/recognition/yolo-47450253/tune.py b/recognition/yolo-47450253/tune.py new file mode 100644 index 000000000..50c40d8a6 --- /dev/null +++ b/recognition/yolo-47450253/tune.py @@ -0,0 +1,21 @@ +import torch +from ultralytics import YOLO + + +DATA_YML = "./data.yml" +OUTPUT = "./models" + +device = 'cuda' if torch.cuda.is_available() else 'cpu' #default to gpu +model = YOLO("yolo11n.pt").to(device) + +settings = { + 'data': DATA_YML, + 'epochs': 30, + 'imgsz': 640, + 'batch': -1, + 'iterations': 50 + } + + +if __name__ == '__main__': + results = model.tune(**settings) \ No newline at end of file