I have some blast output in tsv format. The first column is the query_id, the second is the subject_id, and the third column is the alignment. An example format of the data is:
Dictyostelium_gi | 550558531 | gb | Dictyostelium_gi | 550558531 |gb YKLADEQWSTTILLIMCQE..
Dictyostelium_gi | 550558531 | gb | Physarum_gi | 560558531 |gb DEPPALI-X-WSTTILLIMCQE..
I made a script to extract out the line that has the query and subject id's equal like in the first line of the example. And also, make a file that contains the query_id and alignment match from the line of interest. Here is the script:
! /usr/bin/env python3.5
import sys
blast_out = open('out.tsv','r')
for line in blast_out:
line = line.split('\t')
query_id = line[0]
subject_id = line [1]
alignment = line [2]
if 'query_id' == 'subject_id' in line:
output = line
for i in output:
seq_id = output[0]
seq_seq = output[3]
blast_output = open('blast_output','w')
blast_output = ('>' + 'seq_id' + '\n' + 'seq_seq')
return blast_output
I am having issues generating the output I want. That is, the line from the tsv file, which has its query and subject id's equal and the file containing the query_id and alignment from the line. What is the issues in my script, please and thank you?
also you have to double check the return of: line = line.split('\t')
sometime it will contain an empty element '' at the beginning. Thus, you need to adjust the list index accordingly.
I checked that, the list index seems to be correct. But I'm still not getting any output. I find it very puzzling.
There are some problems in your original code
Try my posted code instead to see whether it works or not.
The script runs completely. However, the output file 'blast_output' is empty.
try add :
print(output)
in the code, to see whether this list is empty.
I debugged it the output is empty, but everything else shows results when I use the print command.
then there must be something wrong with the file parse, could you paste some lines of your input file? since I don't see any tabs in your posted example:
Dictyostelium_gi | 550558531 | gb | Dictyostelium_gi | 550558531 |gb YKLADEQWSTTILLIMCQE..
I corrected some errors in the previous coding. But:
your posted example is not well formatted. I tried to sep the line with tab, it doesn't work. If it's due to copy/paste problem, you could try my updated code to see whether it works for your real file.
Besides, if your file is clean, use awk as Pierre Lindenbaum's comments is much simpler than write a python script.
The code and Pierre helped. Thanks a bunch!