Entering edit mode
4.1 years ago
anolidinasha
•
0
I'm doing an epitope prediction and conservation study using SARS CoV 2 sequences. In most of the amino acid sequences eg: spike protein, of SARS CoV 2 sequences, belonging to different geographical regions, unknown X amino acids are there.They cause errors in epitope prediction and epitope conservation study using IEDB epitope conservation tool. What can I do to clear this hindrance because of X?I can't delete X and carry on the analysis because the positions with X interfere with the predicted epitopes.
Either use a tool that is able to handle missing values (if there is one) or do imputation (use the corresponding nucleotide sequences if you have them).
I used GISAID database to download the nucleotide sequences of SARS CoV 2 and then only translated them into amino acid sequences, using a reference sequence. The unknown X is there because of the unknown N nucleotides in the SARS CoV 2 sequences I have selected. So, is there a way to work with X without having to replace the sequences with new ones
You can do missing nucleotide imputation for example with an HMM or a random forest. Or you find another tool to use that can handle missing data or you don't use sequences with missing information.
Thank you very much for your suggestions.