Entering edit mode
9.9 years ago
biolab
★
1.4k
Hi everyone,
I am using USEARCH to identify orthologous genes between two species. I set evalue cutoff 1e-5 and top hit option. However, I am suspicious of this in silico method, I show an example as below. Is my method somewhere wrong? THANKS a lot for any of your suggestions!
Query >LOC_Os07g04960_1
Score Evalue %Id QueryLo-Hi(Un) TargetLo-Hi(Un) Target
228 3e-19 42% 39-149(2) 268-383(18) AT5G15780_1
Qry 39 PAAAIPAVPAMPKPTIPTIVPAVTLPPIPAVPKVTLPPMPAIPTVPAVTMPPMPAVPAVPAVTLPPMPAVPTVPPNTVV 117
| . || | | ||.| | |||| | :| .|||.| ||| | |:| .| .| | ||||.| :||.|| |.
Tgt 268 PPSIIP-----PNPLIPSI-PTPTLPPNPLIPSPPSLPPIPLIPTPP--TLPTIPLLPTPPTPTLPPIPTIPTLPPLPVL 339
Qry 118 VPAAVV--PALP------KVALPPMAAVPNVP----MPFLAPPP 149
| :| |.|| | |||. .| :| .| : | |
Tgt 340 PPVPIVNPPSLPPPPPSFPVPLPPVPGLPGIPPVPLIPGIPPAP 383
124 cols, 52 ids (41.9%), 21 gaps (16.9%), score 228.0 (92.4 bits), Evalue 2.5e-19
In the example you posted, it seems most of the alignment is in the low complexity region. First, USEARCH might not be a good choice if you want to identify significantly diverged sequences. From the manual
See if you can avoid the problem in the example you posted by using "seg" for masking the repetitive and low-complexity regions instead of the default method USEARCH uses. Is there any other specific reason to doubt the accuracy of USEARCH?
Hi Siva, thank you very much for your reply. Your comment is very helpful. I need to set identity cutoff.
Depending on your species of interest, you may want to have a look at the orthologues from the Comparative Genomics analyses in Ensembl.
Thank you Denise, your comments are helpful.