Question

Python Script for extracting out Query and Alignment sequence from an all_v_all Blast

0

Entering edit mode

7.6 years ago

apulunuj ▴ 30

I have some blast output in tsv format. The first column is the query_id, the second is the subject_id, and the third column is the alignment. An example format of the data is:

Dictyostelium_gi | 550558531 | gb |   Dictyostelium_gi | 550558531 |gb  YKLADEQWSTTILLIMCQE..


Dictyostelium_gi | 550558531 | gb |   Physarum_gi | 560558531 |gb  DEPPALI-X-WSTTILLIMCQE..

I made a script to extract out the line that has the query and subject id's equal like in the first line of the example. And also, make a file that contains the query_id and alignment match from the line of interest. Here is the script:

! /usr/bin/env python3.5

import sys

blast_out = open('out.tsv','r')

for line in blast_out:

   line = line.split('\t')
   query_id = line[0]
   subject_id = line [1]
   alignment = line [2]
   if 'query_id' == 'subject_id' in line:
      output = line
      for i in output:
         seq_id = output[0]
         seq_seq = output[3]
         blast_output = open('blast_output','w')
         blast_output = ('>' + 'seq_id' + '\n' + 'seq_seq')
       return blast_output

I am having issues generating the output I want. That is, the line from the tsv file, which has its query and subject id's equal and the file containing the query_id and alignment from the line. What is the issues in my script, please and thank you?

python blast • 4.1k views

ADD COMMENT • link updated 7.6 years ago by Pierre Lindenbaum 166k • written 7.6 years ago by apulunuj ▴ 30

score 1 · Answer 1 · 2017-09-30

1

Entering edit mode

7.6 years ago

shoujun.gu ▴ 380

Try something like this:

output=[]
with open('Your_file', 'r') as f:
    for line in f:
        line = line.split('\t')
        query_id = line[0]
        subject_id = line [1]
        alignment = line [2]   
        if query_id == subject_id:
            output.append(line)
out=[]
for i in output:
    l=i.split('\t')
    blast_output='>'+l[0]+'\n'+l[3]
    out.append(blast_output)

with open('blast_output','w') as file:
    file.writelines(out)

ADD COMMENT • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

also you have to double check the return of: line = line.split('\t')

sometime it will contain an empty element '' at the beginning. Thus, you need to adjust the list index accordingly.

ADD REPLY • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

I checked that, the list index seems to be correct. But I'm still not getting any output. I find it very puzzling.

ADD REPLY • link 7.6 years ago by apulunuj ▴ 30

0

Entering edit mode

There are some problems in your original code

Try my posted code instead to see whether it works or not.

ADD REPLY • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

The script runs completely. However, the output file 'blast_output' is empty.

ADD REPLY • link 7.6 years ago by apulunuj ▴ 30

0

Entering edit mode

try add :

print(output)

in the code, to see whether this list is empty.

ADD REPLY • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

I debugged it the output is empty, but everything else shows results when I use the print command.

ADD REPLY • link 7.6 years ago by apulunuj ▴ 30

0

Entering edit mode

then there must be something wrong with the file parse, could you paste some lines of your input file? since I don't see any tabs in your posted example:

Dictyostelium_gi | 550558531 | gb | Dictyostelium_gi | 550558531 |gb YKLADEQWSTTILLIMCQE..

ADD REPLY • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

I corrected some errors in the previous coding. But:

your posted example is not well formatted. I tried to sep the line with tab, it doesn't work. If it's due to copy/paste problem, you could try my updated code to see whether it works for your real file.

Besides, if your file is clean, use awk as Pierre Lindenbaum's comments is much simpler than write a python script.

ADD REPLY • link 7.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

The code and Pierre helped. Thanks a bunch!

ADD REPLY • link 7.6 years ago by apulunuj ▴ 30

score 1 · Answer 2 · 2017-09-30

1

Entering edit mode

7.6 years ago

Pierre Lindenbaum 166k

the following awk script should do the job

awk -F '\t' '{if($1==$2) printf(">%s\n%s\n",$1,$3);}'

however, in your example, the two columns are not strictly the same, there is an extra '|' at the end of the first column...