Dealing with sequencing errors in coding sequences

1

Entering edit mode

11.2 years ago

ishengomae ▴ 110

It just happens that some sequences downloaded from Genbank or any other major sequence database will be found to have some errors in it (e.g. spurious frameshift indels introduced due to sequencing errors). I am wondering how people deal with this problem especially in any medium to large scale project that will probably involve automated downstream analyses. My attempt has been to align each sequence against corresponding refseq protein, but I find this method to be unreliable because it first relies on the quality of the refseq protein, secondly it is hard to locate areas where sequencing errors causes problems especially if the introduced indel(s) do not abruptly lead into stop codons. How do people deal with this?

genome sequence alignment • 1.5k views

ADD COMMENT • link updated 3.9 years ago by Ram 45k • written 11.2 years ago by ishengomae ▴ 110

0

Entering edit mode

I don't know how you can tell what is an error. If they're clear, maybe you can automatically fix them.

ADD REPLY • link 11.2 years ago by karl.stamm 4.1k

Login before adding your answer.