Last updated: 29/03/2012
To evaluate interactive and (semi-)automatic segmentation algorithms that segment the prostate in transversal T2-weighted MR images from multiple centers and vendors and with differences in scanning protocol.
- April 19th 2012 - Training data available for download
- June 4th 2012 - Testing data available for download
- June 29th 2012 - Deadline for submission of results and papers
- July 6th 2012 - Evaluation results and notification of acceptance sent to particpants
- October 1st 2012 - Workshop at MICCAI2012
To segment the prostate in transversal T2-weighted MR images. The data includes both patients with benign disease (e.g. benign prostatic hyperplasia) and prostate cancer. Additionaly, to test the robustness and generalizability of the algorithms, data will be from multiple centers and multiple MRI device vendors. Differences in scanning protocols will also be present in the data, e.g. patient with and without an endorectal coil.
To participate in the challenge and have your results visible on this website it is mandatory to submit a paper explaining your algorithm. The use of external datasets is in principal not allowed (although exceptions can be made for, for example, ImageNet-pretrained neural networks as the external dataset is completely unrelated to the challenge task). If you use external data such as ImageNet, describe this in the paper.
There are 50 training cases available for download. These cases include a transversal T2-weighted MR image of the prostate. The training set is a representative set of the types of MR images acquired in a clinical setting. The data is multi-center and multi-vendor and has different acquistion protocols (e.g. differences in slice thickness, with/without endorectal coil). The set is selected such that there is a spread in prostate sizes and appearance. For each of the cases in the training set, a reference segmentation is also included.
Each downloaded file contains MR scans, stored in Meta (or MHD/RAW) format. This format stores an image as an ASCII readable header file with extension .mhd and a separate binary file for the image data with extension .raw. This format is ITK compatible. Documentation is available here. Applications that can read the data are MeVisLab, SNAP, Slicer or ParaView. If you want to write your own code to read the data, note that in the header file you can find the dimensions of the scan and the voxel spacing. In the raw file the values for each voxel are stored consecutively with index running first over x, then y, then z. The voxel-to-world matrix is also available in this header file.
The voxel type for T2-weighted images is SHORT (16 bit signed). The voxel type for the reference standard image is CHAR (8 bit signed). The reference standard image only contains the values 1 for prostate and 0 for background.
Each submission should be a single compressed archive containing the prostate segmentations of all test images. The compression should be ZIP. Segmentation files should be directly in the root of the archive, and not nested in a folder structure. Each segmentation should be a MHD/RAW file of type 8 bit unsigned char. The dimensions of each segmentation should be the same as the T2-weighted image it is based on. In addition, the image origin, spacing and direction should be stored correctly in the .MHD file. If you are unsure, please check your segmentation in for example Slicer. The filenames should start with the filename of the original T2-weighted MHD file. The segmentation images themselves should contain only the voxel values 0 and 1, where 1 is prostate and 0 is background.
If you provide a valid set of results you will receive an e-mail with the metrics calculated against the reference standard after we have checked whether there were no errors. After we checked the manuscript supplied with your submission, the result will be put online.
Please note that we discourage 'training on the test set' and as such we will only allow resubmission if the algorithm presented in the paper is substantially different from earlier submitted results (no parameter tweaking, that is what the training set is for) or if there were clear errors unrelated to the algorithm in previously submitted results (e.g. wrong image sizes, file names, etc.).
A link to a paper on arXiv or any other official preprint server should be provided at time of submission. The paper should at least contain a full description of the algorithm. A good example can be found here.
Together with the results a short paper describing your algorithm should be submitted. The maximum size of the paper is 8 pages and the paper should be formatted according the LNCS guidelines. You can find a template including details on the mandatory parts of the paper at this location.
For evaluation of the algorithms a reference standard segmentation was constructed. For all the test cases the reference standard segmentation was set by an observer with multiple years of experience in prostate MRI, either as a radiologist, resident or image analysis researcher. All the reference standard segmentation where checked and corrected by a radiological resident with more than 6 years of experience in prostate MRI.
For the test cases a clinical researcher was asked to segment the prostate as a second observer. This observer did not participate in the creation of the reference standard and was blinded to it. The second observer segmentations where once again checked by the radiological resident for completeness and anomalies, but not for accuracy.
The metrics used in this study are:
- the Dice coefficient
- the relative absolute volume difference, the percentage of the absolute difference between the volumes
- the average boundary distance, the average over the shortest distances between the boundary points of the volumes
- the 95% Haussdorf distance
All evaluation metrics are calculated in 3D. We chose both boundary and volume measurements to give a more complete view of segmentation accuracy. In addition to evaluating these metrics over the entire prostate segmentation, we also calculated the boundary measures specifically for the apex and base parts of the prostate. This was done because these parts are very important to segment correctly, for example in radiotherapy and TRUS/MR fusion. Additionally, these are the most difficult parts to segment due the large variability and high slice thickness. To determine the apex and base the prostate was divided into three approximately equal parts in the slice dimension (first 1/3 of the prostate volume was considered apex, last 1/3 was considered base).
Calculations of the metrics was performed in MeVisLab (www.mevislab.de). The metrics were calculated as follows:
- The Dice coefficient: two times the amount of overlapping voxels between the reference segmentation and the submitted results, divided by the sum of the amount of voxels in both the reference segmentation and the submitted result.
- Relative absolute volume difference: amount of voxels in the submitted result divided by the amount of voxels in the reference standard. The result minus 1 and times 100 is the relative volume difference. The absolute relative volume difference was used as a metric.
- Average boundary distance and 95th percentile Haussdorf distance: extract the surface of the submitted result and the surface of the reference standard. Calculate a distance map (using itkDanielsonDistanceMap) to the surface for the reference standard, mask it with the surface of the submitted result and calculate a histogram of the resulting distances. Again create a histogram of the remaining distances and pick the largest average and 95th percentiles out of the two histograms. For more information, please see the links above.
Algorithms are ranked by comparing the resulting evaluation measure to the second observer and the optimal segmentation. If you perform equal to the second observer you get 85 points for that metric. If you get a perfect score you will get 100 points. So your score is calculated by filling in the equation a*x + b = score. Here a and b are calculated using the second observer score and the perfect score for a metric. As an example, if you have a Dice of 0.87 for a case, the second observer has a score of 0.83 you will obtain a score of 80.9 (perfect segmentation has a Dice coefficient of 1). We will average the scores over all metrics for a score per case. Then the average over all cases will be used to rank the algorithms.
The overview paper was published in Medical Image Analysis (here). If you make use of the PROMISE12 dataset in your research, it is mandatory to cite this paper.
During the challenge workshop at MICCAI2012 a live challenge was held with participants from the online challenge. During the live challenge algorithms had 3 hours to segment the prostate on 20 unseen T2-weighted images.