I am using culling_limit 1 as a parameter
From the manual : Delete a hit that is enveloped by at least this many higher-scoring hits
My understanding : The culling limit can be used to remove redundant hits. In practice it sets the number of hits returned per subject sequence
The command line $blastn -query reads.fa -subject locus.fa -strand plus -culling_limit 1 -dust no -out result.csv -outfmt 6
One unexpected result :
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
QJLFG:08700:06611 gi|372099098:113208001-113426000 98.131 107 1 1 1 106 51978 52084 5.11E-49 185
QJLFG:08700:06611 gi|372099098:113208001-113426000 79.167 120 14 8 103 215 217412 217527 4.15E-15 73.1
QJLFG:08700:06611 gi|372099098:113208001-113426000 97.561 41 1 0 103 143 217437 217477 1.49E-14 71.3
The "3rd hit" as far as i understand is redundant regarding the "2nd hit" : same subject region, same part of the read involved, but it's a shorter alignement with a higher e-value
Why is it not discarded with culling_limit 1 ?
Hello,
I appreciate this is an old post, but I am having the same issue. Did you manage to find a solution to this problem? I am searching a large number of similar queries against about 2k genomes, and for some of these target sequences I am getting >50 hits with culling limit of 1. Does culling limit not do what I think it does?
Thanks in advance.
That is an uncommon parameter that I have not personally used but help for that parameter says
If you are getting >50 hits then perhaps they are all higher scoring hits. Have you tried to set the parameter to a larger number? Are you looking to keep only one hit?
Hi GenoMax, sorry for the late response. Yes I am looking to only keep one hit.
I am not sure I follow the idea that all the reported hits are higher scoring hits. My interpretation of that definition is that culling limit should remove the hits for which there is a higher scoring hit. If I set it to 1 it should "delete a hit that is enveloped by at least 1 higher scoring hit". Surely this means that only the highest scoring hit would remain?
Hum it's pretty hard to read, here is a focus on relevant infos :
format : qstart<->qend --- sstart<->send --- evalue
2nd hit : 103<->215 --- 217412<->217527 --- 4.15E-15
3rd hit : 103<->143 --- 217437<->217477 --- 1.49E-14