Journal:Histopathology image classification: Highlighting the gap between manual analysis and AI automation

Full article title	Histopathology image classification: Highlighting the gap between manual analysis and AI automation
Journal	Frontiers in Oncology
Author(s)	Doğan, Refika S.; Yılmaz, Bülent
Author affiliation(s)	Abdullah Gül University, Gulf University for Science and Technology
Primary contact	refikasultan dot dogan at agu dot edu dot tr
Editors	Pagador, J. Blas
Year published	2024
Volume and issue	13
Article #	1325271
DOI	10.3389/fonc.2023.1325271
ISSN	2234-943X
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2023.1325271/full
Download	https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2023.1325271/pdf (PDF)

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you.

Abstract

The field of histopathological image analysis has evolved significantly with the advent of digital pathology, leading to the development of automated models capable of classifying tissues and structures within diverse pathological images. Artificial intelligence (AI) algorithms, such as convolutional neural networks (CNNs), have shown remarkable capabilities in pathology image analysis tasks, including tumor identification, metastasis detection, and patient prognosis assessment. However, traditional manual analysis methods have generally shown low accuracy in diagnosing colorectal cancer using histopathological images.

This study investigates the use of AI in image classification and image analytics using histopathological images using the histogram of oriented gradients method. The study develops an AI-based architecture for image classification using histopathological images, aiming to achieve high performance with less complexity through specific parameters and layers. In this study, we investigate the complicated state of histopathological image classification, explicitly focusing on categorizing nine distinct tissue types. Our research used open-source multi-centered image datasets that included records of 100,000 non-overlapping images from 86 patients for training and 7,180 non-overlapping images from 50 patients for testing. The study compares two distinct approaches, training AI-based algorithms and manual machine learning (ML) models, to automate tissue classification. This research comprises two primary classification tasks: binary classification, distinguishing between normal and tumor tissues, and multi-classification, encompassing nine tissue types, including adipose, background, debris, stroma, lymphocytes, mucus, smooth muscle, normal colon mucosa, and tumor.

Our findings show that AI-based systems can achieve 0.91 and 0.97 accuracy in binary and multi-class classifications. In comparison, the histogram of directed gradient features and the random forest classifier achieved accuracy rates of 0.75 and 0.44 in binary and multi-class classifications, respectively. Our AI-based methods are generalizable, allowing them to be integrated into histopathology diagnostics procedures and improve diagnostic accuracy and efficiency. The CNN model outperforms existing ML techniques, demonstrating its potential to improve the precision and effectiveness of histopathology image analysis. This research emphasizes the importance of maintaining data consistency and applying normalization methods during the data preparation stage for analysis. It particularly highlights the potential of AI to assess histopathological images.

Keywords: data science, image processing, artificial intelligence, histopathology images, colon cancer

Introduction

Histopathological image analysis is a fundamental method for diagnosing and screening cancer, especially in disorders affecting the digestive system. It is a type of analysis used to diagnose and treat cancer. In the case of pathologists, the physical and visual examinations of complex images often come in the form of resolutions up to 100,000 x 100,000 pixels. On the other hand, the method of pathological image analysis has long been dependent on this approach, known for its time-consuming and labor-intensive characteristics. New approaches are needed to increase the efficiency and accuracy of pathological image analysis. Up to this point, the realization of digital pathology approaches has seen significant progress. Digitization of high-resolution histopathology images allows comprehensive analysis using complex computational methods. As a result, there has been a significant increase in interest in medical image analysis for creating automatic models that can precisely categorize relevant tissues and structures in various clinical images. Early research in this area focused on predicting the malignancy of colon lesions and distinguishing between malignant and normal tissue by extracting features from microscopic images. Esgiar et al. [1] analyzed 44 healthy and 58 cancerous features obtained from microscope images. As a result of the analysis, the percentage of occurrence matrices used equaled 90 percent. These first steps form the basis for more complex procedures that integrate rapid image processing techniques and the functions of visualization software. Digital pathology has recently emerged as a widespread diagnostic tool, primarily through artificial intelligence (AI) algorithms. [2, 3] It has demonstrated impressive capability in processing pathology images in an advanced manner. [4, 5] Advanced techniques, identification of tumors, detection of metastasis, and assessment of patient prognosis are utilized regularly. Through the utilization of this process, the automatic segmentation of pathological images, generation of predictions, and the utilization of relevant observations from this complex visual data have been planned. [6, 7]

Convolutional neural networks (CNNs) have received significant focus among various machine learning (ML) techniques in AI research. As a result of the application of deep learning in previous biological research, ML has been extensively accepted and used. [8–10] CNNs distinguish themselves from other ML methods because of their extraordinary accuracy, generalization capacity, and computational economy. Each patient’s histopathology photographs contain important quantitative data, known as hematoxylin-eosin (H&E) stained tissue slides. Notably, Kather et al. [11] have explored the potential of CNN-based approaches to predict disease progression directly from the available H&E images. In a retrospective study, their findings underscored CNN’s remarkable ability to assess the human tumor microenvironment and prognosticate outcomes based on the analysis of histopathological images. This breakthrough showcases the transformative potential of such AI-based methodologies in revolutionizing the field of medical image analysis, offering new avenues for efficient and objective diagnostic and prognostic assessments.

On the other hand, in the literature, manual analysis methods are also available to classify and predict disease outcomes using the H&E images. Compared to AI-based algorithms, traditional manual analysis generally performs lower. It is highlighted in the literature that the performance of traditional methods like local binary pattern (LBP) and Haralick is poor. [12, 13] These studies emphasized that deep learning is more effective in diagnosing colorectal cancer using histopathology images, and that traditional ML methods are poor. The accuracy of LBP is 0.76 percent, and Haralick’s is 0.75. In this context, since methods such as LBP and Haralick showed low accuracy in the literature, we decided to adopt an approach other than these two methods. We chose to carry out this study with the histogram of oriented gradients (HoG) method. Unlike other studies in the literature, we performed analysis using HoG features for the first time in this study. Our choice offers an alternative perspective to traditional methods and deep learning studies. The results obtained using HoG features make a new contribution to the literature. This study offers a unique perspective to the literature by highlighting the value of analysis using HoG on a specific data set.

Table 1 provides an overview of manual analysis and AI-based studies from various literature sources. In a study by Jiang [14], a high accuracy rate of 0.89 was achieved using InceptionV3 Multi-Scale Gradients and generative adversarial network (GAN) for classifying colorectal cancer histopathological images. Kather et al. [6] resulted in an accuracy metric of 0.87 using texture-based approaches, decision trees, and support vector machines (SVMs) to analyze tissues of multiple classes in colorectal cancer histology. Other studies include Popovici et al. [15] at 0.84 with VGG-f (MatConvNet library) for the prediction of molecular subtypes, 0.84 with Xu [16] using CNN for the classification of stromal and epithelial regions in histopathology images, and 0.83 with Mossotto [17] using optimized SVM for the classification of inflammatory bowel disease. Tsai [19] demonstrated 0.80 accuracy metrics with CNN for detecting pneumonia in chest X-rays. These results show that AI-based classification studies generally achieve high accuracy rates. The primary emphasis of these studies revolves around AI methods employed in analyzing histopathological images, with a particular focus on CNNs. These networks have demonstrated exceptional levels of precision in a wide range of medical applications. These algorithms have demonstrated remarkable outcomes in cancer diagnosis and screening domains. CNNs provide substantial benefits compared to conventional approaches, owing to their ability to handle and evaluate intricate histological data. These methods also excel in their capacity to detect patterns, textures, and structures in high-resolution images, thereby complementing or, in certain instances, even substituting the human review processes of pathologists. The promise of these AI-based techniques to change the field of medical picture processing is well acknowledged.

Author	Aim of research	Method	Accuracy metric
Table 1. Overview of the literature on manual analysis and AI-based studies.
Jiang et al. [14]	Colorectal cancer histopathological images classification	InceptionV3 multi-scale gradient generative adversarial network	0.89
Kather et al. [6]	Analysis of multiple classes of textures in colorectal cancer histology	Texture-based approaches, decision trees, and SVMs	0.87
Popovici et al. [15]	Prediction of molecular subtypes	VGG-f (MatConvNet library)	0.84
Xu et al. [16]	Classifying the stromal and epithelial sections of histopathology pictures	CNN	0.84
Mossotto et al. [17]	Classification of inflammatory bowel disease	Optimized SVM	0.83
Sena et al. [18]	Tumor tissue classification	Custom CNN (4CL, 3FC)	0.81
Tsai and Tao [19]	Chest X-ray pneumonia detection	CNN	0.80
Shapcott et al. [20]	Classification of nuclei	CNN based on Tensorflow "ciFar" model	0.76

Materials and methods

Dataset

Our research was based on the use of two separate datasets, carefully selected and prepared for use as our training and testing sets. We carefully compiled the training dataset (NCT-CRC-HE-100K) from the pathology archives of the NCT Biobank (National Center for Tumor Diseases, Germany), including records from 86 patients. The University Medical Center Mannheim (UMM), Germany [11, 21] generated the testing dataset using the NCT-VAL-HE-7K dataset. It included data from 50 patients. We obtained the datasets from open-source images after carefully removing them from formalin-fixed paraffin-embedded tissues of colorectal cancer. The dataset we used for training and testing consisted of 100,000 high-resolution H&E (hematoxylin and eosin) images.

From these images, we selected 7,180 non-overlapping sub-images, also known as sub-images. Each of these sub-images measures 0.5 microns in thickness and boasts dimensions of 224x224 pixels. The richness of our dataset is further highlighted by the inclusion of nine distinct tissue textures, each encapsulating the subtle difficulties of various tissue types. These encompass a broad spectrum, from adipose tissue to lymphocytes, mucus, and cancer epithelial cells. Table 2 meticulously presents the distribution of images within the test and training datasets, segmented by their respective tissue classes. For instance, we meticulously assembled a training dataset featuring a robust 14,317 samples within the colorectal cancer tissue class. Simultaneously, the testing dataset for this class comprises 1,233 samples. These detailed statistics play a crucial role in providing readers with a comprehensive understanding of the data distribution and the relative sizes of each class within the study, forming the foundation for our subsequent analyses and model development.

Classes	# of images in training set	# of images in test set
Table 2. The number of H&E images in the training and test sets used in the study. ADI - adipose; BACK - background; DEB - debris; LYM - lymphocytes; MUC - mucus; MUS - smooth muscle; NORM - normal colonic mucosa; STR - cancer-associated stroma; TUM - colorectal adenocarcinoma epithelium.
ADI	10,407	1,338
BACK	10,566	847
DEB	11,512	339
LYM	11,557	634
MUC	8,896	1,035
MUS	13,536	592
NORM	8,763	741
STR	10,446	421
TUM	14,317	1,233

All images in the training set were normalized using the Macenko method. [22] Figure 1 describes the effect of Macenko normalization on sample images. The torchstain library [23], which supports a PyTorch-based approach, is available for color normalization of the image using the Macenko method.

Figure 1. (A) Target/reference image, (B) source image, and (C) normalized source image.

Figure 1A represents this method’s target/reference image, while Figure 1B represents the source images. Macenko normalization aims to make the color distribution of the source images compatible with the target image. In the example shown in the figure, the result of the normalization process applied on the source images (Figure 1B), taking the target image (Figure 1A) as a reference, allows us to obtain a more consistent and similar color profile by reducing color mismatches, as seen in Figure 1C. This will make obtaining more reliable results in machine learning or image analytics applications possible. Normalization was performed on the dataset on which the model was trained, and applying this normalization to the test set can increase the model’s generalization ability. However, the test set represents real-world setups and consists of images routinely obtained in the pathology department. Therefore, since these images wanted to train a clinically meaningful model with different color conditions, they were not applied to the normalization test set. In this way, we also investigated the effect of applying color normalization on classifying different types of tissues. The original data set—shown in the first row of Figure 2—from nine different tissue samples has substantially different color stains; however, the second row of Figure 2 shows their normalized versions. These images are transformed to the same average intensity level.

Figure 2. First row represents Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colonic mucosa (NORM), cancer-associated stroma (STR), and colorectal adenocarcinoma epithelium (TUM). The second row data set was obtained by applying normalization to the same tissue examples in the first row.

Manual analysis algorithm

In traditional manual analysis, the classification process was emphasized by extracting HoG features. HoG features represent a class of local descriptors that extract crucial characteristics from images and videos. They have found typical applications across various domains, encompassing tasks such as image classification, object detection, and object tracking. [24, 25]

The following parameters were used to extract HoG features:

Number of orientations: nine; this is the number of gradient directions calculated in each cell.
Cells per pixel: each cell consists of 10x10 pixels.
Blocks per cell: each block contains 2x2 cells.
Rooting and block normalization: Using the `transform_sqrt=True` and `block_norm=“L1”` options, rooting and L1 norm-based block normalization were performed to reduce lighting and shading effects. The resulting features are more robust and amenable to comparison, especially under variable lighting conditions. This can improve the model’s overall performance in image recognition and classification tasks.

Using these parameters increases the efficiency and accuracy of the HoG feature extraction process, thus ensuring high performance in colorectal cancer tissue classification.

We chose HoG features, one of the local descriptors, because HOG processes the image by dividing it into small regions called cells. As illustrated in Figure 3, the cells are created with the original images. In the created cells, the gradients of the pixels in the x-direction (Sx) and the y-direction (Sy) are calculated as such:

S={\sqrt {{S_{x}}^{2}+{S_{y}}^{2}}}

The gradient direction θ is computed using the computed gradients as such [26]:

\theta =\tan ^{-1}({\frac {S_{x}}{S_{y}}})

Figure 3. (A) Original image. (B) Extracted HoG features from the image. (C) HoG features shown on the original image.

After calculating the gradients, the histograms are calculated, and these histograms are combined to form blocks. Normalization is performed on the blocks to avoid lighting and shading effects. The study involved a comprehensive analysis of the image set, where all images were initially standardized to a dimension of 224x224. This standardization was conducted to enhance the classification performance of the features derived from these images. To achieve this, a bilinear interpolation method was applied, resizing the images to a standardized dimension of 200x200. The decision to reduce the image dimensions from 224x224 pixels to 200x200 pixels was taken to optimize the calculation time. In particular, a 224x224 image produces 7,200 feature vectors, while a 200x200 image performing the same process produces 5,832 feature vectors. The feature vector formula is explained as follows:

Feature\ vector\ size=N\ \times \left({{\frac {W}{C_{W}}}-CpB_{X}+1}\right)\times \left({{\frac {H}{C_{H}}}-CpB_{y}+1}\right)\times Orientations\times CpB_{x}\times CpB_{y}

... where N is the number of pixels in the image; W and H are the image’s width and height, respectively; C_W cell weight and C_H cell height define cell dimensions; and CpB_x and CpB_y are the number of cells per block, representing the number of directions calculated for each cell in the HoG feature vector. Not only does dimensionality reduction have the advantage of reducing computational time, but smaller-sized feature vectors can potentially reduce memory usage and the overall complexity of the model. This is due to optimizing model training and prediction times, especially when working on large data sets.

This study preferred the random forest (RF) algorithm as the ML model for colorectal cancer tissue classification. RF is one of the ensembles learning methods and creates a robust and generalizable model by combining multiple decision trees. The main reason for this choice is that RF performs well on different data sets. It can work effectively on complex and multidimensional data sets. RF can operate effectively on large data sets and high-dimensional feature spaces. RF is resilient to noise and anomalies in the data set. It can also evaluate relationships between variables, increasing the stability of the model. RF can deal with overfitting problems, preventing the model from overfitting the training data. These features support the suitability of the RF algorithm for colorectal cancer tissue classification. As a result of preliminary tests and analyses, it was decided that RF was the most suitable model. This choice is intended to obtain reliable results.

RF applications are practical biomedical imaging and tissue analysis tools with features such as high-dimensional data processing ability, accuracy, and robustness. Challenges to this implementation, such as computational efficiency, potential overfitting, and especially interpretability and explainability, are also significant. It was stated that the algorithm was possible and was considered to increase security, especially for the future of medicine and clinical research. It is interesting to note that comprehensive feature selection plays a critical role in learning and comprehensively makes the consolidated results more accurate and robust. This is extremely important in increasing efficiency in medicine and clinical research. [27, 28]

AI-based automation

In this study, a remarkable CNN architecture was developed for the image classification problem. The developed model aimed to achieve high performance with less complexity through specific parameters and layers. We used a simple CNN-based architecture and trained it using the same H&E images, the same data we used in the manual analysis part. We aim to compare the performance of manual analysis and AI-based automation methods in classifying colorectal cancer tissue images. Table 3 shows the parameter and structure information of the CNN model.

CNN layers	Parameters and explanations	Complexity
Table 3. Details, complexity, and hyperparameters of the simple CNN architecture we built for the AI automation approach. ReLU - rectified linear unit.
1. Image input	28x28x3 images with "zscore" normalization	Low (preprocessing step)
2. Convolution	Eight 3x3 convolutions with stride [1 1] and padding "same"	Moderate (only eight filters)
3. Batch normalization	Batch normalization	Low to moderate
4. ReLU	ReLU	Low
5. Max pooling	2x2 max pooling with stride [2 2] and padding [0 0 0 0]	Low
6. Convolution	Sixteen 3x3 convolutions with stride [1 1] and padding "same"	Moderate
7. Batch normalization	Batch normalization	Low to moderate
8. ReLU	ReLU	Low
9. Max pooling	2x2 max pooling with stride [2 2] and padding [0 0 0 0]	Low
10. Convolution	Thirty-two 3x3 convolutions with stride [1 1] and padding "same"	Higher due to the increased number of filters
11. Batch normalization	Batch normalization	Low to moderate
12. ReLU	ReLU	Low
13. Fully connected	Three fully connected layers	High
14. Softmax	softmax	Low
15. Loss function	creossentropyex	Low

In the manual analysis section, steps were taken to extract features from images and train the model using these features. Nevertheless, we used images directly as input in the AI automation part, then created a model suitable for the purpose and carried out the training process. In the AI automation approach, we used local filters, intermediate steps, and a multilayer artificial neural network model to train the base CNN model (Table 3). [10] This table explains the layers, structures, and hyperparameters of the CNN model used in the AI automation section. This model includes direct use of images and essential operations such as sequential convolution, batch normalization, and rectified linear unit (ReLU) activation. Finally, it uses the classification of results with softmax activation and a cross-entropy-based loss function. This model reflects a complex structure aimed at classifying colorectal cancer tissue images. This approach has played an important role in comparing the performance of manual analysis and AI automation methods.

This AI automation model is designed to extract and classify features in histopathological images. In the first layer, the model gets input from color (RGB) images of 28x28 pixels. Input images are processed with "zscore" normalization, which brings the mean of the data to zero and its standard deviation to one. The model's architecture then includes a series of convolutional layers, batch normalization, and ReLU activation functions. Convolution layers move over the image to extract feature maps and highlight important features. Batch normalization helps train the network faster and provides more stable performance. ReLU activation functions filter out negative pixel values, increasing the learning ability of the model. Maximum pooling layers shrink the feature maps and increase the model’s scalability. As a result of these layers, the model includes high-level features such as learning and increasing complexity. Finally, the model uses fully connected layers to assign learned features to specific classes and uses the softmax activation function to make the results more consistent.

On the other hand, its existence allows a probability distribution to be provided for each class. The cross-entropy loss function optimizes the learning of the model with accurate classification labels and manual analysis techniques, which are examples of techniques that can be used to optimize the process. As a result of this model, AI automation can improve feature extraction and classification capabilities in histopathology image classification tasks.

The study selected parameters to train the model based on starting values commonly accepted in the literature. [29] In the early stages of the training process, researchers attempted to achieve gradual improvement by choosing a varying initial learning rate. The researchers determined that the maximum number was 50 and presented the maximum number as 100, allowing the model to follow the training data over a long period. At each epoch, the model rearranged the training data to produce a more comprehensive and independent representation unaffected by prior learning. We chose a batch size ranging from 32 to 64 to ensure uniform processing of the samples. We used these parameters to select validation data and determine the evaluation frequency. Continuous evaluation of the model throughout the training process is not only optimal but also guaranteed. By eliminating cases of overfitting, the model achieved greater generalizability, and more reliable results were confirmed. We conducted rigorous testing and used a trial-and-error approach to determine these parameters to monitor the model’s performance. Research findings show that the selected parameters yield the best results, and the model effectively facilitates learning from the dataset.

During the last training session, we carefully determined the exact parameter values that led to the successful training of the model: We set the initial learning rate to 0.01 and the maximum number of epochs to 50, blending is the process of combining data we carried out from different sources. Every complete pass across the entire dataset is performed at every epoch. This is intended to ensure the size is set to 64 during the process. You will also see that 64 data samples are processed together. He managed to complete the task for 20 days successfully. The model in question is a specific learning rate that uses the number of epochs and other parameters that reflect a unique scenario to be trained. We carefully chose these parameters for the model to achieve the necessary level of success and assure the best possible fit to the data set.

In this study, the researchers developed a CNN model to improve their ability to extract essential features and classify histopathology images. The model takes 28x28x3 RGB images as input, processing them with "zscore" normalization. The model structure includes three 3x3 convolution layers containing 8, 16, and 32 filters. The ReLU activation function was used after each convolution layer. Additionally, there are 2x2 sized max pooling layers following each convolution layer. In the final stages of the model, there are three fully connected layers; the softmax activation function was preferred as the last layer. In terms of training parameters, explained in Table 4, the model is initially trained with a learning rate of 0.01 and works on the data set for a maximum of 50 epochs. Data shuffling is applied at the end of each epoch, and 64 is selected as the batch size. The performance and generalization ability of the model are continuously monitored with validation dataset evaluations performed every 20 epochs. Determining these parameters ensures that the model adapts to the data set most appropriately and reaches the desired level of success, and also helps the model avoid possible problems such as overfitting during the training process. With this configuration, the model has the necessary feature extraction and classification capabilities to produce effective and accurate results in histopathological image classification.

Training parameter	Value
Table 4. The CNN-based automation model training parameters.
Initial learning rate	0.01
Maximum number of epochs	50
Data shuffle	After each epoch
Batch size	64
Verification data and evaluation frequency	Every 20 epochs

The cross-entropy loss calculates the difference between the probability distribution that the model predicts and the probability distribution of the actual labels. The formula for binary classification is as follows:

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.