DeepCOVID-19 : a model for identification of COVID-19 virus sequences with genomic signal processing and deep learning.
The spread of Coronavirus Disease-2019 worldwide necessitates the development of accurate identification methods and the determination of genetic relatedness. The result of genomic methods involving nucleotide alignment informed the considerations of several alignment-free techniques for virus detection. This paper presents a genomic sequence identification model, developed based on Genomic Signal Processing (GSP), deep learning, and genomic datasets of Coronavirus 2 (SARS-CoV-2), Severe Acute Respiratory Syndrome CoV (SARS-CoV), and Middle East Respiratory Syndrome CoV (MERS-CoV). Our results showed that the Z-Curve images for the three viral strains depicted high visual similarities in texture and color, thus making it difficult to differentiate the strains by visual inspection. However, the homogeneity distance showed that SARS-CoV-2 is closer to SAR-CoV than MERS-CoV. Following a validation accuracy of 98.33%, it became clear that Z-Curve images for MERS-CoV, SARS-CoV and SARS-CoV-2 have distinct features after transformation by the Convolutional Neural Network (CNN) classifier. The divergence in texture and color reflects genetic variation among the strains, which is too insignificant for differentiation via visual inspection. Our results showed that higher layers of CNN amplify aspects of input images that are critical for discrimination, thereby confirming the importance of deep learning and GSP in accurate viral detection.