Border irregularity loss for automated segmentation of primary brain lymphomas on post-contrast MRI

Unlike for other brain tumors, there has been little work on the automatic segmentation of primary central nervous system (CNS) lymphomas. This is a challenging task due the highly variable pattern of the tumor and its boundaries. In this work, we propose a new loss function that controls border irregularity for deep learning-based automatic segmentation of primary CNS lymphomas. We introduce a border irregularity loss which is based on the comparison of the segmentation and it smoothed version. The border irregularity loss is combined with a previously proposed topological loss to better control the different connected components. The approach is general and can be used with any segmentation network. We studied a population of 99 patients with primary CNS lymphoma. 40 patients were isolated from the very beginning and formed the independent test set. The segmentations were performed on post-contrast T1-weighted MRI. The MRI were acquired in clinical routine and were highly heterogeneous. The proposed approach substantially outperformed the baseline across the various evaluation metrics (by 6 percent points of Dice, 40mm of Hausdorff distance and 6mm of mean average surface distance). However, the overall performance was moderate, highlighting that automatic segmentation of primary CNS lymphomas is a difficult task, especially when dealing with clinical routine MRI. The code is publicly available here: https://github.com/rosanajurdi/LymphSeg.


INTRODUCTION
Primary central nervous system (CNS) lymphoma (PCNSL) is a rare and aggressive type of cancer that primarily affects the brain and spinal cord.It accounts for approximately 4% of newly diagnosed primary CNS tumors and 1% of all non-Hodgkin lymphomas (NHL).Magnetic resonance imaging (MRI) suggests its diagnosis, is needed for tumor biopsy planification, and has a pivotal role in PCNSL post-treatment assessment.Precise tumor quantification is highly desirable, and one first needed step to promote better patient care.It could aid in pre-surgical planning and in objective tumor response evaluation. 1 Manual segmentation is time-consuming and is partly subjective.
][4][5][6][7][8][9][10][11] However, while many papers have dealt with other types of tumors such as gliomas, 12 only a limited number of studies have focused on brain lymphomas.Some works focused on classification 13,14 or survival analysis. 15Automatic segmentation of PCNSL from MRI data has been performed in a few publications. 16,17 owever, these works were not specific to lymphoma and included also gliomas for training or validation.Moreover, they did not report distance-based metrics for evaluation which is critical when dealing with lymphomas which have complex boundaries and multiple components.
PCNSL has an extremely heterogenous MRI appearance.Lesions can be single or multiple.Several morphologies, topographies, and mass effect patterns can (co)exist.As a result, tumor size, shape and boundary can considerably vary between patients.
These variations pose significant challenges in accurately segmenting lymphomas.Convolutional neural network (CNN)-based segmentation approaches are widely used but often produce errors near boundaries.To address this issue, prior knowledge integration methods at the level of the loss function have been explored to enhance the plausibility of automatic segmentations. 18 this paper, we propose a novel prior-based loss that integrates border irregularity attributes of the tumor in order to improve segmentation performance.The proposed method was trained and validated for segmentation of PCNSL on post-contrast T1-weighted MR images using a dataset of 99 patients.
The rest of the paper is organized as follows.Section 2 describes the dataset and pre-processing.In Section 3, we introduce the proposed loss.Section 4 is devoted to experiments and results.The discussion is provided in Section 5.

Participants and MRI data
We studied 118 patients with primary CNS lymphoma.The study was approved by the Institutional Ethical Committee (Pitié Salpêtrière Hospital, Ile-de-France VI, n°DC-2009-957) and by the French Data Protection Authority (CNIL, Commission Nationale de l'Informatique et des Libertés, DR 2013-279).According to French regulation, consent was waived as these images were acquired as part of the routine clinical care of the patients.Each patient had a T1-weighted MRI after injection of gadolinium.The images were acquired as part of clinical routine and were thus not harmonized.They were acquired on different scanners and at different field strengths (57 at 3T, 54 at 1.5T, and 7 at 1T).The MRI scan was either 3D or 2D.Manual segmentations were performed by a trained radiologist (L.N.) who also rated the difficulty of the segmentation process.The segmentation was considered difficult when the lymphoma tissue was less extended, when the lesions boundary were difficult to visualize, and/or, in rarer cases, when there were hemorrhagic remnants that are spontaneously T1 hyperintense and that can therefore mimic tumors.Furthermore, the radiologist noted whether the images presented substantial artifacts.This led to partition the dataset into four subsets: D1, D2, D3, D4 according to these two criteria (D1: easy, no artifact; D2: easy, artifacts; D3: difficult, no artifact; D4: difficult, artifacts).
We applied the following preprocessing.All images were converted from DICOM to NIfTI using dicom2niix 19 * and organized according to the Brain Imaging Data Structure (BIDS) standard. 20Using FSL FLIRT, 21 we linearly registered each image to the MNI-152 template which is 1mm isotropic and of dimensions 181 × 217 × 181. 22,23 e applied a brain mask in order to remove unnecessary information like the skull, nose, and ears.Pydra was used to implement the preprocessing steps. 24We visually checked the preprocessing results and found that preprocessing failed for 19 patients (9 in D1, 3 in D2, 3 in D3 and 4 in D4).These were excluded from the study.The characteristics of the studied patients and the corresponding subsets are presented in Table 1.

Proposed border irregularity loss
In order to include it in a loss function, one needs to be able to quantify the border irregularity of an object.One approach is to smooth the segmentation map, via a Gaussian filter of a kernel size s and standard deviation σ, until it becomes more uniform in shape, and then compare the smoothed segmentation and the original segmentation. 25 segmentation with a greater degree of irregularity would require stronger smoothing, resulting in a higher index of border irregularity.Conversely, a smoother segmentation would yield a lower index of irregularity.Applying a Gaussian filter with to a segmentation map is a well-established technique in computer vision for achieving smoothness and extracting the global structure of a segmentation map without generating new irregularities, indentations, or protrusions.The irregularity index is defined as where A * and A are the smoothed and non-smoothed segmentation maps.The border irregularity index represents the level of dissimilarity between the smoothed and non-smoothed segmentations.Another way to express dissimilarity between two segmentations is via the complement of the Dice coefficient.Specifically, the Dice coefficient can be used to derive a border irregularity index as In this paper, we will use I b as a measure of border irregularity at the loss function level, since it leads to a direct implementation of a differentiable and smooth gradient loss function.We optimize the difference between the ground-truth border irregularity index I b and the predicted border irregularity index Îb .The impact of the smoothing procedure on the segmentation is determined by two factors: the smoothing level (σ) and the border irregularity.When dealing with a segmentation map that exhibits severe irregularity, a more intense smoothing is needed, leading to a higher σ value, and conversely.Conversely, a smoother segmentation would result in a lower σ measure of irregularity.In this work, the smoothing maps for the ground-truth segmentations are obtained statically prior to the training process and fed to the framework in order to compute the border irregularity of the prediction maps.
The process can be summarized as follows: 1) irregularities are gradually smoothed out in a systematic manner by applying a variable σ.Smaller irregularities are first eliminated, followed by the larger ones.As some indentations or protrusions are smoothed, they may reveal the presence of a larger irregularity in their respective locations.This larger irregularity is considered global irregularity, while the smaller ones are regarded as "local" irregularities.Consequently, a hierarchical structure of irregularities is established given varying σ values.

Segmentation smoothing module
Implementation of the smoothing module was carried out using a Gaussian kernel of size k = {5, 10} and a standard deviation of σ.A variable standard deviation σ = 2 x for x ∈ {0, 1, 2, 3, ...10} was iteratively applied.The smoothed segmentation map is obtained when the value obtained for I b saturates reaching a fixed value over two iterations.This means that the smoothing process has reached a stable state, and further iterations may not improve the result.The stopping conditions ensure that the smoothing process is performed until a desirable outcome is achieved or until further iterations do not significantly affect the result.They help control the iterations and prevent unnecessary computations, improving the efficiency of the smoothing process.

Model and implementation details
We zero-padded the data to a size of (184, 220, 184). 26The MRI scan intensities were normalized between 0 and 1.The border irregularity loss was either used in conjunction with the Dice loss only (corresponding results are denoted as BIL) or in combination with our previously proposed topological loss 27 and the Dice loss (results denoted as BIL-Topo).The proposed approaches were compared to a baseline which used the Dice loss (corresponding results are denoted as Baseline).Each 3D volume processed as a stack of independent 2D images.The network architecture was a 2D U-net 28 which architecture has been used in previous publications. 29,30  architecture is a 3-stage structure composed of convolutional, de-convolutional blocks, bottleneck and skip connections.The encoder part is composed of an ensemble of convolutional and batch normalization layers, whereas the decoder part is composed of 2 consecutive convolutional blocks and an upsampling layer in each of the 3 stages.The bottleneck is composed of 2 convolutional blocks separated by a residual block. 31e optimizer was Adam and the learning rate was 0.001.The learning rate was halved if the validation performances did not improve over 20 epochs as proposed by. 32We used batches of 8.
At inference, we predict for each slice independently and then stack the slices belonging to the same patient to form a 3D prediction.Table 2. Results on the whole test set and separately for D1, D2, D3 and D4 test sets.Results are presented as mean and confidence interval (computing using bootstrapping with 15,000 resamplings).D1+D2+D3+D4 refers to the union of the test sets of D1 and D3 together with D2 and D4.HD: 95% 3D Hausdorff distance.MASD: mean average surface distance.n is the number of samples.Best result in each case is in bold.* indicates that the improvement over the baseline is statistically significant.

Evaluation framework
We chose the following performance metrics, based on the recommendations of Reinke et al, 33 the 3D Dice coefficient, the 95% 3D Hausdorf distance, and the mean average surface distance (MASD) † .The results are reported as the mean values along with their corresponding confidence intervals, which were computed using bootstrapping with 15,000 resamplings.
Since the dataset is constituted of 3D and 2D MRI scans, we report the mean and the 95 % confidence interval obtained via bootstrapping via 15000 resamples for each of the 2D and 3D patients both combined and separately.

Results
Table 2 presents the results on the whole test set as well as on the different subsets.The proposed method (BIL-Topo) outperformed the baseline across all performance metrics (improvement of about 6 percent points of Dice, 40mm of HD and 6mm of MASD).Confidence intervals are quite wide due to the relatively small size of the test set (n=40) 29 but the difference was statistically significant for the boundary-based metrics (HD, MASD).
For the BIL alone, the mean Dice over the entire test set was substantially higher than for the baseline (5 points) and similar to that of the BIL-Topo.On the other hand, the boundary-based metrics were substantially better with the BIL-Topo, demonstrating the added value of the topological loss.Results in the different subsets are consistent with those on the entire set.One can observe that performances tend to be lower in subsets for which the segmentation was considered difficult by the rater (D3 and D4).Some examples of segmentation are shown on Figure 1.
Table 3 presents separately the results obtained on 2D vs 3D acquisitions.For both 2D and 3D acquisitions, the BIL-Topo method achieved higher Dice, lower HD and MASD a compared to the baseline.The improvement was statistically significant for HD and MASD.

DISCUSSION
The paper introduces a novel border irregularity loss for automatic segmentation of brain lymphomas.It considers border information and combines it with a topological loss to better handle multiple connected components.
Results demonstrate the usefulness of the proposed approaches.The BIL and topological losses capture border characteristics, improve boundary delineation, and enhance segmentation performance.They both resulted in improvement in terms of Dice score over the baseline.The topological loss provided additional improvements in boundary metrics.Nevertheless, the overall performances remain moderate.The average Dice is around 65% which corresponds to a moderate spatial agreement.The relatively high Hausdorff distance mainly reflects the fact that in several cases, some tumor components are missed while erroneous connected components are detected.This highlights that automatic segmentation of brain lymphomas is a very difficult task, in particular when dealing with clinical routine MRI data of heterogeneous quality.Therefore, further work in needed on this application.The present work remains preliminary and has several limitations.Firstly, the BIL exhibits sensitivity towards small connected components.Also, the smoothing parameters are chosen in an ad-hoc manner.Future work should propose more general ways to set these parameters.Finally, it will be necessary to assess the impact of the BIL when associated with other segmentation architectures.
In future work, the aim is to address these limitations.Specifically, efforts will be made to decouple the BIL loss from its reliance on the topological loss, enabling it to effectively handle very small objects and multiconnected components.

Figure 1 .
Figure 1.Examples of segmentations (ground truth, baseline and proposed methods).Red boxes are tumor parts that have been missed by the models.

Table 1 .
). Characteristics of the study population.Age (in years) is reported as mean±standard-deviation.The age was missing for 8 patients (4 of them were part of D1, 2 of D2, 1 of D3, and 1 of D4).The sex was missing for one patient from D4.The table reports the number of patients for which a 3D acquisition was available (the others had a 2D acquisition).1T, 1.5T or 3T indicates the MRI magnetic field strength.

Table 3 .
Results on test set depending on whether the acquisition was 2D or 3D.Results are presented as mean and confidence interval (computing using bootstrapping with 15,000 resamplings).HD: 95% 3D Hausdorff distance.MASD: mean average surface distance.n is the number of samples.Best result in each case is in bold.* indicates that the improvement over the baseline is statistically significant.