Blast standalone output formatting
1
0
Entering edit mode
7.6 years ago
sukesh1411 ▴ 30

Hi,

I am running tblastn standalone in ubuntu. I wanted to sort the output format 0 (pairwise)using percentage identity. I am able to do this with output format 6 or 7, but not with 0.

Thanks

blast • 3.3k views
ADD COMMENT
0
Entering edit mode

To get more help, please provide the command that you're running to sort it, and an example of the output (4-5 lines).

ADD REPLY
0
Entering edit mode

Hi

The command used for tblastn

tblastn -query allpeptide.fa -subject BDCS01.1.fsa_nt -outfmt 0 > results.txt. In 6 or 7 format i can export the results into excel and sort it based on the percentage identity, but in these formats i cant visualize the pairwise alignment results.

output file of format 0(pair wise)

Query= peptides

Length=255

Subject= dbj|BDCS01000002.1| Momordica charantia DNA, contig: scaffold_1,

strain: OHB3-1, whole genome shotgun sequence

Length=6565189

Score = 31.2 bits (69),  Expect = 0.14, Method: Compositional matrix adjust. 
Identities = 21/65 (32%), Positives = 30/65 (46%), Gaps = 3/65 (5%)
Frame = +3

Query  106      LYGGSSLS-PWYMCARMLLETNPKFSPNEYCFTYERLGGNSLAKVDEVYDCNARNMMNNA  164
                L+GG  L   W +  R + E + K     +CF  +R       K + +Y C  RN  N+A
Sbjct  6112794  LFGGDILVYIWQIITRFMNEIHKKKKSLCFCFEVQRNHFLVATKENHIYIC--RNCFNHA  6112967

Query  165      EPALS  169
                 P LS
Sbjct  6112968  IPPLS  6112982


Score = 26.9 bits (58),  Expect = 2.8, Method: Compositional matrix adjust.
Identities = 11/19 (58%), Positives = 13/19 (68%), Gaps = 0/19 (0%)
Frame = -3

Query  227      STCTEPSLSGSYLPHCGWH  245
                S+C E SLS S+   CGWH
Sbjct  6340289  SSCDENSLSNSFEHGCGWH  6340233

Query= peptides

Length=255

 Subject= dbj|BDCS01000005.1| Momordica charantia DNA, contig: scaffold_4,

strain: OHB3-1, whole genome shotgun sequence

Length=5144829


 Score = 27.7 bits (60),  Expect = 1.6, Method: Compositional matrix adjust.
Identities = 11/26 (42%), Positives = 16/26 (62%), Gaps = 0/26 (0%)
 Frame = +3

Query  200      NLPMDMTVWADPYCVTMDMPKMSKYG  225
                NLP D  V  DP+CV++D  +  K+ 
Sbjct  1511976  NLPADGAVGVDPWCVSVDTAQKWKHA  1512053

Thanks

ADD REPLY
0
Entering edit mode

Please add the "code" tag to your output copypaste, it's impossible to understand like this. It's the 5th button from the left of your editor.

ADD REPLY
0
Entering edit mode

As @genomax2 said here, it's impossible to sort such file because you don't have each line structured in the same way as the previous one (this is how sorting goes). To sort such type of file perhaps some Bioperl or Biopython modules exist, but it's definitely impossible to do it through command line.

ADD REPLY
0
Entering edit mode

How would one be able to sort pair-wise text alignment results (format 0)? Other formats are sortable since they are pure numeric. If you need to filter your results above a certain alignment % (or something along those lines) then use an appropriate combination of blast paramters along with your -outfmt 0 directive.

ADD REPLY
0
Entering edit mode

I suggest rerun the blast with outfmt 11, and convert it to any other outfmt (6 or 7) with blast_formatter.

ADD REPLY
2
Entering edit mode
7.6 years ago
Joe 21k

Your best bet is probably SearchIO from (Bio)python/perl.

http://biopython.org/wiki/SearchIO

It has parsers that are capable of reading in BLAST results. I assume what you're calling pairwise format, is what they call 'plain-text', in which case that format is supported by the parser. Alternatively, XML might be an option.

Once you've got all the records as objects, you can implement a sort/filter in python etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6