Entering edit mode
8.0 years ago
bei
•
0
Hi, I was wondering if anyone could tell me if how to filter BLAST output according to the name of sequence? Thanks!
For example:blast will give the following output:
A11_610 gi|502439232 68.4 57 18 3.1e-14 85.5
A11_1273 gi|951490813 85.3 68 10 1e-24 120.6
A11_1116 gi|476506208 65.3 4 71 2.1e-11 76.6
A11_1132 gi|497849802 97.9 48 48 8.3e-17 94.4
And a second txt file contains the sequence name:
A11_610
A11_1273
How to filter the blast output only containg A11_610 and A11_1116?
Always remember potential error brought by
grep
andgrep -f
.e.g.
A11_610
matches more thanA11_610
:CSV/TSV tools are better choice:
For my csvtk, use:
EDIT: Sorry I ignored the option
-w
Anyway, CSV/TSV tools able to search given columns can run faster generally.
Indeed,
-w
would have tackled that. But another danger for-f
is having an empty line in your file, which will match everything...Thanks for your soon reply!
I have two files: one is blast output file (-outfmt 6), the other is txt file containing some sequence names. I just want to filter blast output file with the needed sequences (for example: A11_610 and A11_1273).
I have run your command, however, it doesn't work for me.
What does that mean? What was the result? In my command, file2.txt would be the file with the needed sequences and file1.txt would be the blast output file.
Thank you! I made a mistake that file1.txt was sequence names. Your command works!
Do you have a comman more fast? I have millions of sequnces.Now, i just test 300 sequences, with 50 blast hits, the computer is still filtering.
sort both files and use join https://linux.die.net/man/1/join e.g.: merge two files