MAPLES-DR Dataset Description

The project was originally motivated in 2018 by the absence of large fundus public datasets providing pixel-wise annotations of both anatomical and pathological structures of the retina. Such datasets are especially valuable in a context where the deep learning models which achieved state-of-the-art diagnosis of DR were criticized for their lack of explainability or interpretability.

We designed MAPLES-DR to address this gap by providing pixel-wise annotations of anatomical structures (optic disc, macula, retinal vessels) and pathological lesions (microaneurysms, hemorrhages, neovascularizations, exudates, cotton wool spots, drusens) for nearly 200 images of the well known MESSIDOR public dataset. The annotations were performed by a team of seven senior retinologists from hospitals in Toronto and Montréal (Canada).

MAPLES-DR also includes diagnostic for DR and ME following Canadian teleopthalmology screening guidelines [1]. These diagnoses are closer to international standards [2] [3] than MESSIDOR’s original ones.

Labels Description

../_images/MAPLES-DR_Content_Overview.svg — Overview of MAPLES-DR labels.

Segmentation of Anatomical structures

Anatomical structures are present in all images, including healthy ones, but their appearance and their proximity to lesions provide valuable diagnostic information.

Retinal vessels are indicative of the stage of DR: an increase in arteriolar tortuosity is associated with mild and moderate stages [4], while venous beading and dilation are symptoms of severe proliferative stages. The vascular tree is also used as a reference to assess the readability of an image.

The optic disc, optic cup, and macula are also included in MAPLES-DR. Their purpose for diagnosis is two-fold. First, ME is graded by counting the number of lesions within one or two optic disk diameters from the macula, which implies the annotation of both these anatomical structures. Similarly, clinical definitions of DR severity often distinguish four quadrants by dividing the retina horizontally by a line through the fovea and optic disc (superior / inferior division) and vertically by a line through the fovea (temporal / nasal division) [5]. Second, the positions of the lesions in relation to these healthy structures may indicate different etiologies and severities. For example, clinical guidelines sometimes distinguish between disc neovascularization and other neovascularization.

Warning

One sample of the train set is centered on the optic disc instead of the macula which is not visible in this image. No segmentation of the macula is therefore provided for this image.

Similarly, four samples of the train set and two samples of the test set are missing a segmentation of the optic cup because their boundaries were judged too ambiguous by the retinologists.

Trying to access these segmentation maps will return an blank mask and raise a warning. To remove these images from your dataset, you can specify maples_dr.configure(exclude_missing_macula=True) or maples_dr.configure(exclude_missing_cup=True) to exclude them.

Segmentation of Red lesions

Diabete mellitelus affects the walls of the vessels, eventually causing microvascular dysfunctions that manifest in the retina as microaneurysms, hemorrhages, intraretinal microvascular abnormalities (IRMA), or neovessels. We refer to these pathological structures as “red lesions”.

Microaneurysms appear as small circular dilations of the capillaries. They are early signs of microvascular dysfunction and are commonly used to detect mild DR.

Intraretinal hemorrhages develop in more advanced stages of the pathology and are divided into dot or blot hemorrhages. Dot hemorrhages appear as circular and well-defined spots and are typically caused by the rupture of a microaneurysm. Distinguishing them from microaneurysms is challenging, and only fundus angiography (FA) can differentiate the two with complete certainty. Blot hemorrhages are larger and have less defined borders. Both were annotated simply as hemorrhages in MAPLES-DR. Clinical practice also recognizes superficial (flame-shaped) and vitreous hemorrhages that appear in the most severe stages of retinopathy, none was discovered in the MAPLES-DR dataset.

Starting from the moderate non-proliferative stage (R2), irregular intraretinal vessels can appear, referred to as IRMA. The next stage of the disease (R3) coincides with even more extensive intraretinal changes, which are precursors to worsening of the disease. Indeed, the presence of IRMA indicates a 50% risk of developing neovascularisation within one year, corresponding to a transition to the proliferative stage of the disease. Leakages from extensive neovascularisation are responsible for preretinal and vitreous hemorrhages that can cause major visual loss. In the fundus image, neovascularisations are difficult to distinguish from IRMA; however, fluorescein angiography may reveal a leakage that serves as a discriminant factor between the two. In the absence of this imaging modality, IRMA are not differentiated from neovascularisation in MAPLES-DR.

Segmentation of Bright Lesions

In the severe stages of DR, the retina thickens (edema formation) and hard exudates (also known as lipoprotein exudation) may appear, potentially causing loss of visual acuity. These deposits usually arise from leakage from damaged capillaries. Furthermore, in the case of ischemia, one can observe a blockage in axonal transport (the movement of mitochondria, lipids, proteins, and other substances within the neuron’s body, allowing for its renewal) in the optic nerve fiber layer.

This can lead to the appearance of lesions known as Cotton Wool Spots (CWS), resulting from axoplasmic accumulations. They are characterized by their white appearance and blurry borders. While the principal etiology is diabetic retinopathy, CWS can be observed in other vascular diseases (systemic arterial hypertension, vein obstruction, coagulopathies…)

Finally, MAPLES-DR also provides annotations of drusens. These lesions are more commonly associated with Age-related Macular Degeneration (AMD), with a prevalence varying from 10% (fifth decade of life) to 35% (seventh decade). They usually appear around the macula and are histologically situated at the interface with the Retinal Pigment Epithelium (RPE). It is supposed that they originate from degenerative products of the RPE’s cells and are composed of lipids and glycoproteins. Classifying early stage AMD depends on estimating the size of the drusen.

DR and ME grades

MAPLES-DR grades for DR and ME annotated by the retinologists follow the guidelines developed for Canadian teleopthalmology screening. These guidelines distinguish six grades for DR:

R0: absent

R1: mild

R2: moderate

R3: severe

R4A: proliferative

R4S: stable treated proliferative

R6: insufficient quality for grading

and three for ME:

M0: absent

M1: mild

M2: moderate

M6: insufficient quality for grading

Grades are defined systematically by the number and position of visible red and bright retinal lesions. Each grade is associated with a recommended course of action (from rescreening in two years for mild cases, to immediate referral to an ophthalmologist for the more severe ones).

A detailed definition of the grading system can be found in this paper [1].

Data Records

MAPLES-DR dataset is distributed as two archives: MAPLES-DR.zip and AdditionalData.zip. The first one contains the main data of MAPLES-DR (segmentation maps and grades), while the second one contains additional information on the annotation processes (time to complete, comments) as well as intermediate data (pre-annotation maps, grades before consensus…).

MAPLES-DR.zip

MAPLES-DR.zip is the main archive of the dataset. It’s split into two folders train/ and test/, each one containing a table diagnosis.csv with the final DR and ME grades as well as 12 folders: one for each biomarker. Those folders store the segmentation maps as png binary images following the same naming convention as their MESSIDOR counterparts (eg. 20051019_38557_0100_PP.png). The train set is composed of 138 images, while the test set contains 60 images.

../_images/MAPLES-DR_Data_Record_main.svg — Overview of the main MAPLES-DR archive: `MAPLES-DR.zip`.

Warning

The image 20051020_55346_0100_PP is centered on the optic disc and its macula is not visible. To raise awareness of this issue, the segmentation maps of its macula has been removed from MAPLES-DR.zip/train/Macula/ which therefore only contains 137 images.

Furthemore, for six images (4 in the train set and 2 in the test set), the boundaries of the optic cup were judged too ambiguous by the retinologists for a proper annotation. No segmentation map of optic cup are provided for those images diminishing the number of images in MAPLES-DR.zip/train/OpticCup/ and MAPLES-DR.zip/test/OpticCup/ to respectively 134 and 58 images.

Note

The segmentation maps in MAPLES-DR.zip were resized to match the resolution of the original MESSIDOR images, which vary in dimensions (ranging from 960x1440 up to 1536x2304 pixels). If your application requires a homogeneous resolution (e.g. for training a deep learning model), we advice to use maples_dr python library to crop and resize the images and the segmentation maps to a unique resolution.

AdditionalData.zip

The second archive contains all the additional information and data collected during the annotation process. It’s also the only archive downloaded and used by the maples_dr library. Indeed because AdditionalData.zip includes all the annotations and pre-annotation in the resolution at which they were annotated (1500x1500 pixels), as well as the individual diagnoses graded by each retinologist, all the data contained in the MAPLES-DR.zip can be derived from it.

Note

Note on the duplicates: initially 200 images were annotated but we latter realized that two of them were duplicates from the original MESSIDOR dataset (same images stored using different names). The 2 duplicates were removed from MAPLES-DR.zip but were kept in AdditionalData.zip for transparency.

../_images/MAPLES-DR_Data_Record_additional.svg — Overview of the additional archive: `AdditionalData.zip`.

AdditionalData.zip contains the following files:

biomarkers_annotations_infos.xls: identify which Retinologist performed the annotation of a given biomarker category, the Time spent on each annotation (in seconds), any Comment they left, and the Annotation# rank (1 for the first image annotated, 200 for the last).
diagnosis_infos.xls: contains the grades of DR and ME annotated by each Retinologist as well as the consensus they reached. It also includes the comments left by the retinologists while grading.
MESSIDOR-ROIs.csv: provides the bounding boxes extracted from the MESSIDOR images to obtain squared regions of interest without blank borders. The bounding boxes are stored as top-left (x0, y0) and bottom-right (x1, y1) coordinates in pixels. This file also includes the original resolution in pixel of the MESSIDOR images: H and W (resp. height and width).
dataset_record.yaml: a yaml file containing the biomarkers name, the test and train split (as a list of image name), and the names of the duplicates associated to the names of their “siblings” in MAPLES-DR train set.
annotations/: this folder has 12 subfolders (one for each biomarker), which in turn contains all 200 segmentation maps annotated by the retinologists (train, test and duplicates). One segmentation map was removed from the Macula folder, and 6 were removed from OpticCup (see warning above). All segmentation maps are stored as png binary masks in the resolution at which they were annotated (1500x1500 pixels) using the ROIs provided in the MESSIDOR-ROIs.csv.
pre_annotations/: this folder contains the automatic segmentation of Vessels, Exudates, Hemorrhages, and Microaneurysms provided as pre-annotations to the retinologists.

Annotation Procedure

The annotation procedure was co-designed with the team of retinologists to meet a triple objective:

Providing an intuitive yet effective annotation tool for the classification and segmentation of biomarkers in fundus images.

Enabling a collaborative effort on common annotations despite the geographical distance between the retinologists and the limited time each could dedicate to this program.

Designing a “scalable” annotation protocol, capable of being extended to much more ambitious annotation campaigns, such as labeling large Canadian telemedicine databases containing tens of thousands of images.

To meet these challenges, we developed a custom web-based annotation platform allowing the following workflow: expert annotators can access the Web portal at any time to consult and edit annotations with specialized drawing tools; these annotations and the related information (annotation times, comments) are centralized and stored in a secure database hosted on our laboratory server; as the research team, we assign tasks to annotators, monitor progress, and export annotations via a Python API. The annotation platform (portal, annotation tools, server backend, and Python API) as well as training material for annotators is available on GitHub.

For more details on the annotation process please refer to the MAPLES-DR paper [6].