SeqIO can read and write tab-delimited sequence files and it is my understanding that a TSV file is a tab-delimited file so I'm not sure what the issue would be
# OPEN IN & OUT files
in_handle = open(in_path, "r")
out_handle = open(out_path, "w")
# read in the file
for line in in_handle:
line_tuple = line.strip("\n").split(',') # for TSV just change this to split('\t')
header = line_tuple[0] # column 1
seq = line_tuple[1] # column 2
# WRITE out the edited TSV
out_handle.write("> {}\n{}\n".format(header, seq))
in_handle.close()
out_handle.close()
Be aware of any spaces before or after the comma in a CSV and within the seq name if you want to remove them.
ADD COMMENT
• link
updated 5 months ago by
Ram
44k
•
written 5 months ago by
Sareh
▴
20
1
Entering edit mode
I'd recommend adding the rtrim/ltrim once you get the header and seq and also not adding the space after the > in the output. Otherwise, excellent solution!
Please include a short sample from your input and desired output.
https://biopython.org/wiki/SeqIO
SeqIO.convert doesn't work with TSV's. I would recommend removing the tabs via iteration then joining the lists together in a new fasta file.
Hope this helps.
SeqIO can read and write tab-delimited sequence files and it is my understanding that a TSV file is a tab-delimited file so I'm not sure what the issue would be