Hello everyone, I'd like to explore the functional impact of some InDels on its protein product. So I annotated these mutations with Ensembl and got the corresponding transcript. Then, I translate these transcripts into peptides with InDels to do downstream analysis.
But some transcripts from the Ensembl with one or more "N" at the beginning of the sequence, so I don't know how to deal with these transcripts.
Bellow is a example transcript start with "N from Ensembl:
NTCAGGACAGTCGCTGAAATCACTGAATGCCTCCTCAGGTCATTTAGCACTTATTTTATCCAGTATCTTTGGGCTCCTTCTCCTGGTTCTGTTTATTCTATTTCTCACGTGGTGCCGAGTTCAGAAACAAAAACATCTGCCCCTCAGAGTTTCAACCAGAAGGAGGGGTTCTCTCGAGGAGAATTTATTCCATGAGATGGAGACCTGCCTCAAGAGAGAGGACCCACATGGGACAAGAACCTCAGATGACACCCCCAACCATGGTTGTGAAGATGCTAGCGACACATCGCTGTTGGGAGTTCTTCCTGCCTCTGAAGCCACAAAATGA
So, my questions are :
- why the transcripts with "N"?
- How to deal with "N" when we translate the transcript into peptides?
- Or should we just drop these transcripts with "N"?
Thank you very much.
Someone from Ensembl may be along with an informed comment but perhaps the N was added to shift the frame (if that frame is producing sane results)? Just speculating.
Thank you. I find the corresponding peptides of the example transcript from Ensembl, bellow is the sequence :
It seems that the "N" does't make any sense here, because the transcript start to translate from the second nucleotide. So the "N" here looks weird.
Sounds reasonable, assuming that either zero, one or two Ns are at the beginning - which OP could check?