International Journal of Chemical Studies
  • Printed Journal
  • Indexed Journal
  • Refereed Journal
  • Peer Reviewed Journal
P-ISSN: 2349-8528, E-ISSN: 2321-4902   |   Impact Factor: GIF: 0.565

Vol. 8, Issue 6 (2020)

Identification of a suitable technique for imputation of incomplete genotyping by sequencing (GBS) data


Author(s): Srikanth Bairi and AR Rao

Abstract: In the field of DNA sequencing, Genotype by sequencing is to discover SNPs in order to perform Genotyping studies. A most commonly occurring problem in GBS is the presence of missing observations. Quite often, the standard statistical models may not handle such missing data situation also known as incomplete data situations. An alternative to deal with incomplete data situation is to impute missing data for further downstream analysis. Hence a study is conducted with the objectives (i) to impute missing GBS data by various imputation techniques, based on both supervised and unsupervised learning algorithms at different levels of missingness, (ii) to identify suitable imputation technique to deal with incomplete GBS data situation. Based on correlation coefficient and mean squared prediction error (MSPE) between imputed value and true response, the accuracy of imputation technique was assessed. Different imputation techniques, viz., Mean Allele Frequency Imputation (MNI), Singular Value Decomposition Imputation (SVDI), k-Nearest Neighbour Imputation (kNNI), locally weighted linear regression imputation (LWI), Expectation Maximization Imputation (EMI) and Random Forest Imputation (RFI) were applied on incomplete GBS data of mice, a model animal organism, to assess their performance. The results revealed that RFI was found to be most accurate imputation technique. Besides, the performance of RFI in terms correlation coefficient at 5%, 10%, 15% and 20% missing data situation was observed to be 0.778, 0.765, 0.750 and 0.735 respectively. A Similar trend was also observed for RFI in terms of mean square prediction errors. Thus, it is suggested to use RFI technique to deal with incomplete GBS data situation and prior to the application of genomic selection models for breeding value estimation.

DOI: 10.22271/chemi.2020.v8.i6u.10967

Pages: 1467-1471  |  475 Views  99 Downloads

download (8844KB)

International Journal of Chemical Studies International Journal of Chemical Studies
How to cite this article:
Srikanth Bairi, AR Rao. Identification of a suitable technique for imputation of incomplete genotyping by sequencing (GBS) data. Int J Chem Stud 2020;8(6):1467-1471. DOI: 10.22271/chemi.2020.v8.i6u.10967
 

Call for book chapter
International Journal of Chemical Studies