Outputing a Blast result that includes global pairwise alignment
0
0
Entering edit mode
8.8 years ago
Nano ▴ 20

I have made a Local Blast of one query file (containing thousands of sequences) against many individual database formatted files (each containing thousands of sequences). The result is good. Script below

#!/bin/bash
for file in *.fasta
do
  pb tblastn -query queryFile.fasta -dbnuc $file -out $file\.table -evalue 0.00001 -outfmt 6
done

The results in each file look like this:

BGIBMGA000034   c9147_g1_i1     33.95   162     97      2       42      193     120     605     1e-22   93.2
BGIBMGA000034   c13162_g1_i2    51.35   74      36      0       42      115     300     79      4e-20   84.0
BGIBMGA000034   c15905_g1_i1    36.00   125     79      1       44      167     493     119     4e-19   82.0
BGIBMGA000034   c15528_g1_i1    29.70   165     104     1       42      194     241     735     1e-15   75.1
BGIBMGA000034   c13162_g1_i3    42.65   68      39      0       129     196     2104    1901    2e-08   52.8
BGIBMGA000060   c15177_g1_i1    44.85   689     269     10      51      715     1423    3228    2e-177  561
BGIBMGA000060   c15177_g1_i1    37.74   106     61      1       286     386     1369    1686    2e-12   70.9
BGIBMGA000060   c15177_g1_i3    44.70   689     266     11      55      715     1435    3240    8e-176  556
BGIBMGA000060   c15177_g1_i3    37.74   106     61      1       286     386     1369    1686    2e-12   71.2
BGIBMGA000060   c15177_g1_i2    55.99   384     160     2       341     715     502     1653    3e-136  440
BGIBMGA000060   c15821_g1_i2    30.14   282     174     8       454     714     1955    2794    8e-24   108

The first columns are query IDs while the second column are matched IDs from database files, the values are normal Blast values.

Question: How can I get the code to also output the full sequence (including the header >xxx|yyy) of the matching database sequence as part of the result.

For example (output):

BGIBMGA000034   c9147_g1_i1     33.95   162     97      2       42      193     120     605     1e-22   93.2
>c9147_g1_i1
AGAGGACCATTGCACCGGGAAATTCCTCCCAATTGGGGTTCCAAAGGGGCCCCTTAAGGGAAGGAAGGTTTCCRCCGGCCTTTTTTCCCTTGGGGAAAGGGGTTTCCCCCAAGGTTCCAACCAATTGGGGGCCTTCTTTAAGGG........

Thanks

Blast bash alignment • 1.7k views
ADD COMMENT
0
Entering edit mode

You can't.

You need to run blastdbcmd to retrieve your sequences. Write the id's and the corresponding start and stop positions to a file and send that file as a query to blastdbcmd.

If I'm not mistaken I think the format of the query should be:

Accession_id_1 50 410

Accession_id_2 113 460

ADD REPLY

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6