It just happens that some sequences downloaded from Genbank or any other major sequence database will be found to have some errors in it (e.g. spurious frameshift indels introduced due to sequencing errors). I am wondering how people deal with this problem especially in any medium to large scale project that will probably involve automated downstream analyses. My attempt has been to align each sequence against corresponding refseq protein, but I find this method to be unreliable because it first relies on the quality of the refseq protein, secondly it is hard to locate areas where sequencing errors causes problems especially if the introduced indel(s) do not abruptly lead into stop codons. How do people deal with this?
I don't know how you can tell what is an error. If they're clear, maybe you can automatically fix them.