How to specify cutoff parameters in tblastn?
1
0
Entering edit mode
4.1 years ago

I am working with few protein sequences and a genome and I want to the output to consists only the sequences with identity >=50 and coverage >=50. I can not find any command line for this, any suggestions on how to do this would be a great help to me! Thank you.

tblastn identity coverage blast • 1.0k views
ADD COMMENT
0
Entering edit mode

Only parameter you could potentially use for tblastn is

 -qcov_hsp_perc <Real, 0..100>
   Percent query coverage per hsp

You will need to post-process the results for any other filtering you want to do.

ADD REPLY
0
Entering edit mode

oh okay, I will try that. Thank you!

ADD REPLY
0
Entering edit mode
4.1 years ago

the identity one you can specify as a parameter, but not so for the %coverage.

one approach to get it done : get the tabular output of blast and do some postprocessing on it (python? awk? perl? ... ). Keep in mind that even then this won't be very straightforward (but doable nonetheless) to get an accurate result, due to the nature of your blast search (== protein will be split over different HSP/"exons" over your genomic sequence) so take that into account

EDIT (in reference to @genomax comment) what I'm writing above is recommended/required if you want the stats per protein. The ones for a per HSP basis you can define through blast parameters

ADD COMMENT
0
Entering edit mode

hey, what parameter can be used for identity? I tried -perc_identity but its not there in tblasn,

ADD REPLY
0
Entering edit mode

ah, indeed.

I was confusing it with the blast tabular output options you can request: for that output you can ask the piden or ppos for percentage identical and positive matches respec. (which are then suitable for post processing)

ADD REPLY
0
Entering edit mode

None of this is possible on blast command line was my point. Post-processing would be needed.

ADD REPLY
0
Entering edit mode

true, post processing will be the only way here

ADD REPLY
0
Entering edit mode

yeah, its gonna take a while to trim the output now!

ADD REPLY
0
Entering edit mode

hey, I found this really time saving and really helpful if anyone doesnt want to code. 1. export the data to excel sheet. 2. Go to a new column, 3. Go to fromula and insert an "if" statement, a dialogue box will pop up on the screen with if (statement) : A1 < 50 (note that the identity or coverage is in the A column) this is true "less_than_50" this is false "not_less_than_50" and then click on done. Drage the equation all the way down to the last sequence.

Now, we have a column which is filled with "less_than_50" and "not_less_than_50"

After this, we can selest the filter option and click on the column and select "less_than_50" It will delete the rows which have identity <50.

ADD REPLY

Login before adding your answer.

Traffic: 2501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6