Hi everyone,
I'm having trouble trying to filter blast result outputs.
So, I'm using a huge amount of sequences as queries against a certain genome in a local tblastn, which gives me an .txt output. The thing is, I need to extract the best hits, that I've defined as the lowest e-value, for each genomic region that the genome is divided.
I tried sorting in excel with the Filter command, but as the e-value is presented like '1.08e-108', the excel only considers the numbers before the 'e'. Then, in a hypothetical list containing e-values with 1.08e-108, 2.34e-10 and 1.03e-03 values, excel always choose 1.03e-03.
The next thing I tried to do was sorting each genomic region using Pandas, which I transformed the .txt output from blast in a dataframe for better manipulation, but the same thing that happened like in excel.
This way, I'm selecting manually each best hit, but it is taking too much time.
Here's an example of the output:
BrflORs150.1 KN907735.1 23.616 271 186 6 40 299 80310 81092 1.41e-12 75.1
BrflORs150.2 KN907735.1 24.242 264 178 6 41 296 80313 81062 7.55e-09 63.5
BrflORs155.1 KN907735.1 24.825 286 204 4 23 303 80253 81092 1.29e-17 92.4
BrflORs155.1 KN907735.1 22.388 268 188 7 33 290 181025 181798 1.24e-10 70.1
BrflORs155.1 KN907735.1 24.908 273 181 5 41 302 32141 32920 1.84e-10 69.7
BrflORs155.1 KN907735.1 24.254 268 187 7 39 298 191353 192132 2.81e-10 68.9**
BrflORs155.1 KN907685.1 24.739 287 199 8 25 303 37370 38203 9.68e-13 77.0
BrflORs155.1 KN907685.1 25.926 297 189 12 20 301 14077 14919 9.72e-09 63.9
BrflORs155.1 KN909062.1 21.379 290 204 6 23 300 50032 49199 3.01e-12 75.5
BrflORs155.1 KN909062.1 23.132 281 198 5 27 298 33061 32246 7.06e-11 70.9
BrflORs155.1 KN907432.1 25.862 290 181 8 28 300 166293 165475 2.98e-11 72.0
BrflORs155.1 KN906695.1 26.102 295 191 9 22 303 463829 464671 1.27e-10 70.1
BrflORs155.1 KN906695.1 26.689 296 188 8 22 303 485691 486533 3.83e-10 68.6
From those, for example, for the KN907735.1 region, I'd need to select only the query presenting e-value of 1.29e-17, because is the lowest one.
see if this suffices (works with OP data):
On a different note have you tried the option
That may be all you need if you want to keep one good hit.
Thank you very much! It worked! You are a life savior :')