IoMT
SCIENCE RESEARCH
USES MEDIBOUND
WORKS WITH FHIR

Comparing the Efficiency of Novel Point-Of-Care, Low-Cost Neural Networks in Identifying Specific Stages of Diabetic Retinopathy Across a Limited Retinal Dataset

A Research/Engineering Project Conducted By Nicholas Harty and Aum Dhruv - (2022-2023)

Prerequisites:

EyePACS DR Retinal Imaging Classification Library

Presentation:

Experimental Utility (JS):

Methods, Procedures, and Analysis:

Independent: 
  • Training sample sizes (intervals at 125, 250, 375, 500, 1000, 1500, 2000, 2500) 
  • Type of NN algorithm
Dependent: 
  • Accuracy of each trial 
  • Correlation between training sample size and mean accuracy 
  • Difference between CNN and KNN with increasing sample sizes (line graph)
Control:
  • Database of images
  • Number of testing trials (10 trials with 100 unseen samples)
  • Retinal Dataset (EyePACS/Google Retinal Dataset -> Download Above)
  • TensorFlow API (through ml5.js library)
  • Computing Utility (Chrome Capable Device)
  • Data Sheet Recording Software
  • External Graphing Software (Statistical Analysis)
  • Online Testing Portal that allows for the Virtual Testing of the Researchers’ Algorithms
While KNN and CNN algorithms are of similar structure, there are differences that may act as a limitation. The main difference between the convolutional neural network algorithm and the k-nearest neighbors algorithm exists in their ability to discern between distinct features of an image. The convolutional neural network algorithm processes feature regardless of spatial orientation because of its utilization of kernels that process training images while the k-nearest neighbors algorithm identifies features based upon color/intensity and takes spatial orientation into account when classifying testing images. Another difference exists in their individual training processes. The k-nearest neighbors algorithm trains upon its sample dataset by conforming the data into logit-based matrices which requires little computational power as it simply is building a record/ledger for later classification. On the other hand, the convolutional neural network algorithm uses higher computational power in its training process as it cycles through multiple Convolutional Layers as well as the refinement of its feature identification across a series of predefined epochs. 
  • Although these two algorithms are mathematically distinct in almost every component of their individual function, their output, in terms of correlation, are the same and they share similar efficiency when tasked with the training/classification of bitmapped medical datasets, hence the comparison is justified.
  In conditions with a lower sample count, KNN and CNN algorithms may output a greater spread of results due to less of a dataset to work with. 
  • In order to combat this, the researchers incorporated multiple sample sizes.
  The essential assumptions of the investigators’ study include that the initial dataset, conceived through EyePACS’s diabetic retinopathy study, was accurate in the classifications provided by physicians. Another assumption can be that the algorithms are accurately established to produce equal probabilities in fundamental testing (see “Methods” for more information). An assumption for the replication of the investigators’ study can be seen in the equality of hardware across testing. This ensured equal distribution of RAM (1GB) and CPU clock speed (locked at 1.4 GHz). The final major assumption made in experimentation comes in the form of equality across sample bitmap dimensions. Each image within the investigators’ dataset was minimized down to a 250px by 250px bitmap to ensure efficiency within testing. In a system that undergoes experimental procedures with higher input dimensions, the probabilities could differ based on the new fidelity of the developed feature-set. 

Data Training Collection:

The researchers collected data from EyePACS, an organization that provided access to multiple retinal images classified by different stages of severity. The stages consisted of “no present diabetic retinopathy”, “mild diabetic retinopathy”, “moderate diabetic retinopathy”, “severe diabetic retinopathy”, and “proliferative diabetic retinopathy.” 250 randomly selected images were taken out of the sample to later be used for testing. To reduce bias, researchers also incorporated random sampling in training the k-nearest neighbors model and convolutional neural network model. Both neural network models were trained to a random sample of 25 images for each stage (totaling 125 retinal images), 50 images for each stage (totaling 250 images), 75 images for each stage (totaling 375 images), 100 images for each stage (totaling 500 images), 200 images for each stage (totaling 1000 images), 300 images for each stage (totaling 1500 images), 400 images for each stage (totaling 2000 images), and 500 images for each stage (totaling 2500 images).

Data Processing:

The data was processed through its raw form into composed tables that discerned between probabilities as related to the specific instance of trial and the severity of diabetic retinopathy (DR) provided in the sample image. From this point, the data is converted to be displayed within two radar graphs (as well as two bar graphs) that display the size/area exchange across the change in training sample sizes within each individual DR severity. The data from algorithms is then composed into two lines graphs: one displaying the raw mean accuracy across the change in training sample size and the other displaying the buffered mean accuracy across the change in training sample size.

Algorithm Design/Accuracy: 

The KNN and CNN algorithms were designed to develop and run within a standard web/HTML environment. This environment was specifically selected to broaden the accessibility of the software to all modern hardware across most operating systems (although the investigators conducted all logistic experimentation on hardware of equal limitations). These algorithms were developed through TensorFlow API in tandem with the javascript library “ml5.js” to develop a competent web utility. The fundamental design of these algorithms was reliant upon the constraints provided by the investigators. These constraints were balanced through preliminary testing to assure essential equality of outcome. This outcome came in the form of image classification probabilities and the image datasets used for this balancing act were provided by Google in the form of the TensorFlow 64×64 Clothing dataset (the usage of this dataset was based on the recommendation of data scientists in relation to TensorFlow). This data provided a standardized reference point for how to balance these algorithms and much of the adjustment occurred in the altering of the epoch count in the CNN program. Unlike the KNN algorithm that quickly builds a ledger of logits in its training procedures, CNN undergoes a series of training trials to perfect its feature identification. The number of allowed epochs would shift the basic equality of these algorithms at a fundamental level. The researchers found that ~50 epochs within CNN training were enough to assure equality in fundamental testing. This allowed precision within the investigators’ experimentation when it came to identifying areas of the difference without the bias of adjustable internal factors. This aided in securing a design that was stable in testing and worthy of true statistical comparison. 

Data Testing Collection: 

Prior to training the neural network models, 250 images were taken out of the sample data collected from EyePACS. In each trial, a random sample of 25 images was tested on each trained neural network for both models, for a total of 10 trials. This process was duplicated for each severity level, in total, to utilize 1250 images in testing with a combination of 50 net trials over the 5 severities.

Statistical Background: 

The investigators collected a combination of nominal (type of algorithm, type of training sample size, and DR severity) and ratio data (accuracy). The type of training sample size consisted of n=125, n=250, n=375, n=500, n=1000, n=1500, n=2000, and n=2500. The stages of DR severity consisted of “no present diabetic retinopathy”, “mild diabetic retinopathy”, “moderate diabetic retinopathy”, “severe diabetic retinopathy”, and “proliferative diabetic retinopathy.” The investigators’ data examined the relationship between the type of algorithm and accuracy, for different training sample sizes in identifying specific stages of diabetic retinopathy in retinal images. For each type of training sample size, researchers incorporated an independent measures design in which different retinal images were used per trial, and the quantitative variable (accuracy) was measured in reference to two categorical variables (type of algorithm and DR severity). Because of this, a two-way ANOVA test (analysis of variance) with a significance level (α) of 0.05 was used to determine the experiment’s statistical significance.

1. Researchers should prepare a modern internet computing device.

2. After doing so, open up a browser to run the neural networks. Google Chrome, Firefox, or any other browser will work.

3. Go to the Search tab and type up the link: ”medibound.com/project”. Click enter. This website is programmed to run and compare the available neural networks based upon an uploaded graphic of a retinal 20D image. Upload the training sample images to the “Train The Model” dropdown with an interval of n = 125, 250, 375, 500, 1000, 1500, 2000, 2500.  Following this upload, activate the training module and download the appropriate Neural Network models (3 for CNN, 1 for KNN).

a.) [OR] Alternative Ground-Up Development Plan: 

i.) Develop two contrasting neutral networks (a CNN configuration and a KNN configuration) within Kera’s TensorFlow architecture creator. 

ii.) Ensure that the following constraints are provided to the CNN algorithm:

            • Debug = true
            • Inputs = [64 (px), 64 (px), 4]
            • Task: ‘imageClassification’,
            • Epochs = 50

iii.) Ensure that the following constraints are provided to the KNN algorithm:

            • K = Auto
            • Debug = true

iv.) Upload the training sample images to the “Training” dropdown with an interval of n = 125, 250, 375, 500, 1000, 1500, 2000, 2500.  Following this upload, activate the training module and download the appropriate Neural Network models for each interval of sample images (3 for each CNN, 1 for each KNN, 32 in total)

4. After storing the models from the training module, go to ”project.medibound.com” on the web and select the “Test The Model” dropdown.

5. Insert 25 for the “Random Sample Size per Individual Trial Group (# of Images)” and distribute 250 testing retinal images into each severity category in “Testing Samples”. 

6. From this point, upload both of the model sets received from the training model for a specific training sample size and insert the corresponding training sample size under “Sample Size of Training Group (# of Images)”

7. Confirm that all data has been recorded. If any data is missing, redo that part of the neural network testing.

8. Repeat Steps 6 and 7 until all training sample size models are experimented upon and properly recorded.

9. Analyze and format all collected data.

Results

To measure the accuracy of the CNN and KNN algorithms, mean and buffered accuracy was calculated for each training sample size. For both algorithms, the mean accuracy (%) was the average of all 50 trials (10 per DR severity). The buffered accuracy (%) provided a 1-severity margin to either end of the definite outcome, resulting in an amplified marker of the mean accuracy. In a training sample size of 125 retinal images (ranging in severity), the CNN algorithm resulted in a mean accuracy of 19.36% and a buffered accuracy of 48.00%, while the KNN algorithm resulted in a mean accuracy of 20.36% and a buffered accuracy of 50.00%. In a training sample size of 250 retinal images, the CNN algorithm resulted in a mean accuracy of 20.08% and a buffered accuracy of 52.00%, while the KNN algorithm resulted in a mean accuracy of 20.80% and a buffered accuracy of 54.00%. In a training sample size of 375 retinal images, the CNN algorithm resulted in a mean accuracy of 22.88% and a buffered accuracy of 56.00%, while the KNN algorithm resulted in a mean accuracy of 21.84% and a buffered accuracy of 54.50%. In a training sample size of 500 retinal images, the CNN algorithm resulted in a mean accuracy of 22.88% and a buffered accuracy of 56.44%, while the KNN algorithm resulted in a mean accuracy of 22.32% and a buffered accuracy of 54.56%. In a training sample size of 1000 retinal images, the CNN algorithm resulted in a mean accuracy of 25.60% and a buffered accuracy of 58.00%, while the KNN algorithm resulted in a mean accuracy of 24.68% and a buffered accuracy of 55.00%. In a training sample size of 1500 retinal images, the CNN algorithm resulted in a mean accuracy of 29.04% and a buffered accuracy of 59.00%, while the KNN algorithm resulted in a mean accuracy of 25.20% and a buffered accuracy of 56.00%. In a training sample size of 2000 retinal images, the CNN algorithm resulted in a mean accuracy of 30.64% and a buffered accuracy of 62.00%, while the KNN algorithm resulted in a mean accuracy of 25.20% and a buffered accuracy of 56.22%. In a training sample size of 2500 retinal images, the CNN algorithm resulted in a mean accuracy of 32.16% and a buffered accuracy of 63.33%, while the KNN algorithm resulted in a mean accuracy of 26.64% and a buffered accuracy of 57.00%.

Trends

In both algorithms, there was a general increase in accuracy as the training sample size increased. In the CNN algorithm, this increase was steeper compared to the KNN algorithm. This trend can be noted in the slope of the provided line graphs. In addition to this, the radar graphs and bar graphs highlighted individual upward trends in each category of diabetic retinopathy by increasing sample size. From the line graphs, the pattern of CNN’s spike is evident as the CNN mean accuracy line makes a sharp increase at around 375 samples and in doing so, surpasses the mean accuracy of KNN.

Significance

When presented with training sample sizes of 125 and 250 retinal images, KNN was more accurate, which could justify its use over CNN in lower sample sizes. However, in training sample sizes of n=375 and above, CNN was more accurate, which could justify its use over KNN in higher sample sizes. This supported the investigators’ hypothesis since the CNN algorithm produced a greater average accuracy across higher training sample sizes in this identification process compared to the KNN algorithm, which produced greater average accuracy across lower training sample sizes. At the largest training sample size of 2500 retinal images, the CNN was over 6% higher in accuracy compared to KNN. This could signify that CNN’s accuracy would continue growing at a higher rate with additional training samples. This acknowledgment could prove to be beneficial to larger researchers with worldwide resources. 

Statistical Analysis

The investigators collected a combination of nominal (type of algorithm, type of training sample size, and DR severity) and ratio data (accuracy). Their data examined the relationship between the type of algorithm and accuracy, for different training sample sizes in identifying specific stages of diabetic retinopathy in retinal images. For each type of training sample size, researchers incorporated an independent measures design in which different retinal images were used per trial, and the quantitative variable (accuracy) was measured in reference to two categorical variables (type of algorithm and DR severity). Because of this, a two-way ANOVA test (analysis of variance) with a significance level (α) of 0.05 was used to determine the experiment’s statistical significance. For a training sample size of 125 retinal images, the ANOVA test generated an interaction p-value of 0.0001347. For a training sample size of 250 retinal images, the ANOVA test generated an interaction p-value of 0.0002419. For a training sample size of 375 retinal images, the ANOVA test generated an interaction p-value of 0.0001786. For a training sample size of 500 retinal images, the ANOVA test generated an interaction p-value of 0.01463. For a training sample size of 1000 retinal images, the ANOVA test generated an interaction p-value of 0.000272. For a training sample size of 1500 retinal images, the ANOVA test generated an interaction p-value of 0.000077. For a training sample size of 2000 retinal images, the ANOVA test generated an interaction p-value of 0.00001857. For a training sample size of 2500 retinal images, the ANOVA test generated an interaction p-value of 0.0002475. All of the p-values from the ANOVA test were less than the significance level of 0.05, which meant the null hypothesis, that there is no statistical significance between the type of algorithm and its average accuracy across training sample sizes, was rejected. This signifies that there is a statistically significant relationship. In addition to this, the residuals followed a normal distribution as shown by the Q-Q plot. Furthermore, every interaction priori power generated by the ANOVA test was greater than 0.9850, signifying that the results are likely valid.

When presented with training sample sizes of 125 and 250 retinal images, KNN was more accurate, which could justify its use over CNN in lower sample sizes. However, in training sample sizes of n=375 and above, CNN was more accurate, which could justify its use over KNN in higher sample sizes. This supported the investigators’ hypothesis since the CNN algorithm produced a greater average accuracy across higher training sample sizes in this identification process compared to the KNN algorithm, which produced greater average accuracy across lower training sample sizes. At the largest training sample size of 2500 retinal images, the CNN was over 6% higher in accuracy compared to KNN. This signifies that CNN’s accuracy could continue growing at a higher rate with additional training samples. This acknowledgment could prove to be beneficial to larger researchers with worldwide resources. 

For each training sample size, the two-way ANOVA test conducted by the researchers generated p-values less than the significance level of 0.05, which meant the null hypothesis, that there is no statistical significance between the type of algorithm and its average accuracy across training sample sizes, was rejected. From this, it can be concluded that there is a statistically significant relationship between the type of algorithm and accuracy in identifying specific stages of diabetic retinopathy in retinal images. This means that the type of algorithm played a role in its accuracy, and wasn’t due to chance, proving that the results are likely valid.

These results concur with the community of global issues for which this study was conducted upon. The research and findings of this study will directly aid low-income exhibitions that seek widespread testing for the preventable disease that is diabetic retinopathy. With the data in relation to KNN and CNN, as far as ambiguity in the outcome, the investigators have provided some certainty to the usage of such efficient algorithms within medical treatment. Although ambiguity still lies in the perfection of accuracy within the two computing models, the researchers have provided a defining line between the necessary usage of KNN and CNN in the field. When granted further retinal samples, this study recommends the development of a convolutional neural network to best discern between DR severities. However, when granted fewer retinal samples (n≀250), this study suggests k-nearest neighbors as the more accurate, although not precise, algorithm to best discern between DR severities.

Other Information:

  • Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., Adam, M., Gertych, A., & Tan, R. S. (2017, August 24). A deep convolutional neural network model to classify heartbeats. Computers in Biology and Medicine. Retrieved January 23, 2023, from https://www.sciencedirect.com/science/article/pii/S0010482517302810?casa_token=ufM7dWVQvzIAAAAA%3A6Q7840z1UyqkEDSc5HRGebPpdSi5el09lxikoaeQFj8SVlkSGdhFVaoG7bjfvYfMl7LgaxmaBj0
  • Gulshan V;Peng L;Coram M;Stumpe MC;Wu D;Narayanaswamy A;Venugopalan S;Widner K;Madams T;Cuadros J;Kim R;Raman R;Nelson PC;Mega JL;Webster DR; (n.d.). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. Retrieved January 23, 2023, from https://pubmed.ncbi.nlm.nih.gov/27898976/
  • K-nearest neighbor algorithm: Topics by science.gov. (n.d.). Retrieved January 23, 2022, from https://www.science.gov/topicpages/k/k-nearest+neighbor+algorithm
  • Lam, C., Yi, D., Guo, M., & Lindsey, T. (2018, May 18). Automated detection of diabetic retinopathy using Deep Learning. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. Retrieved January 23, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961805/
  • Nentwich, M. M., & Ulbig, M. W. (2015, April 15). Diabetic retinopathy – ocular complications of diabetes mellitus. World journal of diabetes. Retrieved January 23, 2023, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4398904/
  • Peterson, L. E. (n.d.). K-Nearest Neighbor. Scholarpedia. Retrieved January 23, 2023, from http://www.scholarpedia.org/article/K-nearest_neighbor 
  • Saha, S. (2018, December 17). A comprehensive guide to convolutional neural networks - the eli5 way. Medium. Retrieved January 23, 2023, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 
  • Thorat, N. (2018, June 20). How to build a teachable machine with Tensorflow.js. Observable. Retrieved December 29, 2021, from https://observablehq.com/@nsthorat/how-to-build-a-teachable-machine-with-tensorflow-js)
  • Two Way ANOVA Calculator. (n.d.). Retrieved January 23, 2023, from https://www.statskingdom.com/two-way-anova-calculator.html
  • Zhang, Z. (2016, June). Introduction to machine learning: K-Nearest Neighbors. Annals of translational medicine. Retrieved January 23, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916348/