Entering edit mode
13.7 years ago
Zhenhai Zhang
▴
170
I'm currently working on 454 sequencing of collection of antibodies. However, whenever there is continuous A/C/G/T (AAAAAA), a frameshift may happen by chance. As a result, the nucleotide sequence identity (to germ line) is very high but the protein sequence identify is very low.
Any idea how to correct the frameshift using python script? Or any recommendation of matured tools?
Also, please bear in your mind that the antibody sequence has very high mutation rate.
Thanks a lot!
This in fact a much more common problem. Sequencing errors lead to frameshifts in reading frames which causes the translated sequence to not align with the target sequence. The problem is common in BlastX. The result is that you get 2 different hits (in frames) for the same target sequence. I have suggested projects to rewrite BlastX a few times (allowing 1 nucleotide gaps instead of just full amino acid gaps) since it is in fact quite irritating that a basic tool like that cannot handle real sequences. If it hasn't been done by now maybe we should just do it.
This in fact a much more common problem. Sequencing errors lead to frameshifts in reading frames which causes the translated sequence to not align with the target sequence. The problem is common in BlastX. The result is that you get 2 different hits (in different frames) for the same target sequence. I have suggested projects to rewrite BlastX a few times (allowing 1 nucleotide gaps instead of just full amino acid gaps). It is in fact quite irritating that a basic tool like that cannot handle real sequences. If it hasn't been done by now maybe we should just do it.