Hi everyone,
INTRO: I have completed genome annotation of nonmodel organisms (fish) using MAKER. After three runs of MAKER, I filtered protein.fasta removing all with AED > 0.5 and I BLASTed these against SwissPort: blastp -evalue 1e-6 -max_hsps 1 -max_target_seqs 1
.
PROBLEM: I got quite a lot of cases when two or three gene models have the same blastp match i.e. same hypothetical product. Some of them are next to each other in the genome and I assume that this is due false presence of start/stop codons within the gene span, but others are completely separate.
QUESTION: What causes this? Is it a problem with assembly? And how to correctly fix this issue?
Thanks a lot, Milos
First thought, these are probably paralogs. Also just as a reminder, you should Google about
-max_target_seqs 1
with NCBI blast+ in general, as it does not produce the "best" match always.Thank you!!! I had no clue about this issue with -max_target_seqs 1 before.