Machine Learning in Digital Pathology: A Journey from Handcrafted Feature Descriptors to Deep Learning Approaches

Apr 3, 2017 8:00:00 AM

As outlined in the recent blog article by Dr. Ralf Huss, CMO of Definiens, we believe in big data and machine learning to significantly influence decision making in medicine. Already in digital histopathology, machine learning is a key component, for example, for the detection of regions of interest (e.g., tumor metastasis regions, stroma regions) or for the detection, segmentation and classification of objects of interest (e.g., nuclei, cells, mitosis, glomeruli and glands). The use of machine learning in digital pathology is motivated by the complexity and variety of the problems, and it has been enabled only recently by the availability of large amounts of raw data (e.g., the public TCGA database from the NIH) and of efficient algorithms or increased computing power. However, while big data is getting even bigger by the minute, we cannot accept algorithms without a medical plausibility check and robust clinical validation.

In this blog article, we will discuss the technical background of data-driven machine learning approaches and how they can be combined with knowledge-driven approaches to maximize the power of computational pathology.

Feature Engineering

One of the common strategies to solve image analysis problems using machine learning is to design application-specific handcrafted features, to compute these features on a set of training images, and to use the obtained feature values as input data (training samples) for training a statistical model. The model is trained to discover the mapping between the feature values and the known output values (labels). The goal of feature extraction is to identify the relevant visual context information, and thereby transfer the problem from the space of pixel intensities into a feature space where the mapping (decision function) can be more easily learned. Prior to the recent advances in feature learning and representation learning, features have been specifically engineered for each application based on prior domain knowledge about the appearance of the classes to be distinguished. If we take the example of cell and cell nucleus center detection, a popular family of features is the Fast Radial Symmetry (FRS) transform which enables effective shape modeling of this type of objects. The underlying gradient-driven voting strategy generates maps in which the centers of dark and disk-like objects appear as dark feature points. While very low effort is required to use such well-proven existing features, finding the right set of features can be complex and may require large efforts for novel applications. To overcome this problem, feature learning and representation learning approaches are used where not only the model but also visual context information are directly learned from the data.

Random Forests for Feature Learning

A straightforward approach for data-driven feature learning consists of automatically selecting the best set of representative features on infinite and generic feature spaces. This has been successfully performed by using a Random Forest model with a high number of low-level Haar-like visual context features [Criminisi et al. 2009]. Random Forests follow a deterministic and binary divide-and-conquer strategy to build a probabilistic piecewise constant approximation of the decision function. This strategy allows modelling complex decision functions while being computationally very efficient since samples are hierarchically routed towards independent subparts of the forest trees, both at training and testing phases. The use of predefined and unchanged family of features makes Random Forests out-of-the-box systems. A well-known limitation of these models is, however, that optimization is not performed on a globally defined cost function over all the training data but in a local and greedy manner through each hierarchical route. This limits their representation learning capabilities. Due to their numerous favorable properties, Random Forests have been successfully applied in many image analysis problems. In [Brieu et al., 2016] for example, an approach to learn slide-specific visual context models to detect nuclear regions has been developed that automatically accounts for the high variability of tissue appearance in histopathology images. As another example, in [Peter et al., 2015] classification forests are employed on MR, 3D ultrasound as well as histological data without the features being explicitly designed for each of these applications, thereby demonstrating the autonomous adaptation of this approach to different visual problems.

Convolutional Neural Networks for Representation Learning

Convolutional Neural Networks (CNNs) are very popular approaches which offer a so far unique capability to learn representations. A hierarchy of representation levels reaching from the original image (low level) to the final output labels (high level) is built by network layers for local feature extraction (convolution layers), dimensionality reduction (pooling layers), and non-linear activation functions (activation layer) alternating at each hierarchical level. The convolution layers make the CNN a sparse multilayer perceptron in which each neuron at a given level is connected only locally to the neurons at the previous level thereby considering only a piece of the image representation. This corresponds to a convolution mask and local connections within each layer share their weights. Compared to classical neural networks this dramatically reduces the total amount of weights to be optimized. In opposition to the standard greedy training strategy of Random Forests, the training of CNNs through backpropagation aims at minimizing a global cost function over the full training set. As outlined in [Janowczyk et al. 2016], CNNs are the basis of some recent outstanding breakthroughs in the analysis of digital pathology images with applications e.g., for mitosis detection, nucleus segmentation, gland segmentation, and metastasis detection.

Deep Neural Decision Forests

Kontschieder et al. recently introduced in [Kontschieder et al. 2015] so-called Deep Neural Decision Forests. The authors propose a probabilistic routing alternative to the standard binary and deterministic routing strategy, which makes the optimization of a global cost function through backpropagation possible. Embedding the forest decision functions with the nodes of the fully connected output layer of a convolution neural network enables representation learning. Deep Neural Decision Forests have been shown to further improve the accuracies obtained with the prevailing deep learning approaches, thereby illustrating how teaming conceptually different machine learning approaches can lead to more effective systems.

Model Complexity and Data Availability

Machine learning, representation learning, and deep learning are well-suited approaches in cases where it is challenging to define explicit features and rules to solve a problem. However, it is good practice while designing a machine learning system not only to consider the complexity of the problem but also the amount of data which is available to train, test, and validate it.  On the one hand, very deep models enable the recognition of complex patterns without explicitly formulating prior knowledge. The ISBI Camelyon challenge ( proved in 2016 that very deep network architectures containing millions of parameters can achieve close to perfect accuracies on the difficult problem of automated detection of metastases in H&E stained whole-slide images of lymph node sections. On the other hand, very deep networks can only reach high generalization performance if they are trained on sufficiently large data sets (also requiring enormous computational capabilities). For example, the data of the Camelyon challenge contains a total of 400 exhaustively annotated whole-slide images among which one third is kept for validation purposes. Models containing too many parameters compared to the amount of training data tend to overfit, i.e., they typically exhibit significantly worse performance on unseen samples. It is a pretty obvious fact - but unfortunately too often neglected - that a model performing well on training samples without being able to correctly predict on unseen samples is useless. If few samples are available, less complex machine learning models such as shallow CNNs, Random Forests or rule-based systems are likely to generalize better. Thus, such approaches are still preferable in case of limited training data and if computational efficiency is required compared to deep network architectures.

Knowledge and Data Teaming

An interesting idea to compensate for the weakness of machine learning approaches with respect to a lack of training data is to team knowledge-driven and data-driven representations. As it is not necessary to learn what is already known, focusing the feature and representation learning only on what is unknown makes it possible to build relatively simple models that can be trained on few samples while still being able to generalize well. In [Brieu et al. 2017] we recently proposed to team generic (e.g., Haar-like) and more application-specific (e.g., FRT) features for the detection of cell nuclei and cell centers. In this case, the key idea is that the FRT features provide good detection cues for nuclei which can be modelled as disk-like objects and that the representation of nuclei which cannot be modeled in this way is instead learned using generic Haar-like features. Machine learning is employed to generate a probability map representing the proximity to nucleus centers. The centers are detected as local maxima on this map. Based on the observation that the surface area provides a good complementary cue for estimating the local maxima, a second probability map representing the nuclei surface areas is additionally generated. Instead of teaming application-specific and learned features, prior knowledge can alternatively be encoded in the architecture of the models and in the structure of the data that is used for training. As an example, the contour aware network introduced in [Chen et al. 2016] for gland segmentation aims at incorporating the observation that epithelial cell nuclei provide good boundary cues for splitting clustered objects. This model is trained on the original annotations of the gland objects from pathologists as well as on the automatically extracted boundaries with the purpose of learning explicit representations of the object contours. The two latter examples particularly illustrate how solutions can be designed to enforce machine learning and human cognitive capabilities to complement each other.

As Artificial Intelligence is getting more and more mature, it is crucial to leverage the abilities of modern machine learning approaches to solve complex computer vision-related problems. This is particularly true as some current limitations of deep learning approaches such computational efficiency on whole slide histopathology images are about to be solved using, for instance, hierarchical strategies. It is, however, equally important to develop machine learning systems that use the best of data-driven and knowledge-driven information for each specific application and not to take existing systems as out-of-the-box solutions only.

Dr. Nicolas Brieu
Senior Research Scientist
Definiens AG



I would like to thank Dr. Nathalie Harder, Senior Research Scientist at Definiens AG, for her valuable comments and fruitful discussions. 


[Criminisi et al. 2009] A. Criminisi, J. Shotton, and S. Bucciarelli, “Decision forests with long-range spatial context for organ localization in ct volumes,” in proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2009, (pp. 69–80).

[Brieu et al., 2016] N. Brieu, O. Pauly, J. Zimmermann, G. Binnig, and G. Schmidt, “Slide specific models for segmentation of differently stained digital histopathology whole slide images”, in proceedings of SPIE Medical Imaging 2016 (pp. 978410–978410).

[Peter et al., 2015] L. Peter, O. Pauly, P. Chatelain, D. Mateus and N. Navab, Scale-Adaptive Forest Training via an Efficient Feature Sampling Scheme, in proceedings of Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2015 (pp. 637–644).

[Janowczyk et al. 2016] A. Janowczyk and A. Madabhushi, Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases, Journal of Pathology Informatics, vol. 7, n°1, 2016.

[Kontschieder et al. 2015] P. Kontschieder, M. Fiterau, A. Criminisi and S. Rota Bulo, Deep neural decision forests, in proceedings of IEEE International Conference on Computer Vision 2015 (pp. 1467-1475).

[Brieu et al. 2017] N. Brieu and G. Schmidt, Learning size adaptive local maxima selection for robust nuclei detection in histopathology images, to appear in proceedings of IEEE International Symposium on Biomedical Imaging (ISBI) 2017.

[Chen et al. 2016] H. Chen, Q. Xiaojuan, Y. Lequan, and H. Pheng-Ann, “Dcan: Deep contour-aware networks for accurate gland segmentation”, in proceedings of IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2016 (pp. 2487-2496).