Sorting BLAST hits
2
0
Entering edit mode
2.6 years ago
Rebecca • 0

I am doing a BLAST search on a very repetitive transposon sequence on the maize B73 genome and need to report the top 50 alignments. I get back >100,000 alignments because each chromosome is considered one accession and all of the alignments on that chromosome are returned. I am using the BLAST command max_hsps to limit the number of alignments returned but I need a script or command to parse those results so that I can report only the top 50 alignments, sorted by bit score not by chromosome. Has anyone done this before? I need the actual alignments sorted not the results table.

Sort bitscore genome • 1.6k views
ADD COMMENT
0
Entering edit mode
2.6 years ago
jv ★ 1.8k

I need the actual alignments sorted not the results table

The results and the alignments are one and the same so it's really unclear how you plan to only sort the alignments without parsing the alignment results table, especially if you want to sort by bit score.

There are tools/packages available for working with standard blast-formated output, for example BioPerl and BioPython.

You also have options for customizing the output of blast into tab-delimited results (see Blast - Formatting Output for some examples) that may make it easier to parse.

ADD COMMENT
0
Entering edit mode

Sorry if I wasn't clear. I need both alignments and resort table sorted. Thanks for your response. I will check these out.

ADD REPLY
0
Entering edit mode

If you don't want to bring in another entire module, I have a helper script called blast_to_df.py that formats all the results available here. General use the two main ways of using it (as a script proper on the command line or as main an imported function) are illustrated thoroughly in a series of Jupyter notebooks that you can launch in the browser with no installation by going here and pressing 'launch binder'. When the session spins up, a list of available notebooks will come up. You probably could probably skip to the second notebook as the first is more a general overview and you seem familiar with the various issues. Sorting is demonstrated in those, too.

Because is a simpler, concise script, it may be easily adapted to meet your needs if it isn't covering it.

ADD REPLY
0
Entering edit mode
2.6 years ago

Hi,

It is highly not recommended to use max_hsps to reduce the number of matches since BLAST will stop searching after reaching this limit. It will not return the best results; see this paper. Edit: That was wrong, I've to step back. That was related to max_target_seqs .Sorry.

If you are using Linux, you can sort the table with sort. First properly, it makes sense to use an easier readable format with BLAST, such as:

blastn -num_threads 4 -db DATABASE -query YOUR_TRANSPOSON -outfmt "6 qseqid sseqid nident pident evalue bitscore score length mismatch gapopen gaps qcovs qcovhsp" > result.txt

sort result.txt --version-sort -k1,1 -k4,4 -k5,5r | sort --version-sort -u -k1,1 -k2,2

That's sorts according to the first column (accession), then the percentage identity, and then the highest bitscore (and remove all matches for the same accession with a worse match).

ADD COMMENT

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6