convert csv to fasta format using python
1
0
Entering edit mode
2.5 years ago

i have a tsv file and i want to convert it to fasta please help

Moderator edit note: I have removed the link as it provided a direct link to an archive download and could present a risk.

Example data should be provided as a text snippet

python • 2.8k views
ADD COMMENT
1
Entering edit mode

Please include a short sample from your input and desired output.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

SeqIO.convert doesn't work with TSV's. I would recommend removing the tabs via iteration then joining the lists together in a new fasta file.

Hope this helps.

ADD REPLY
0
Entering edit mode

SeqIO can read and write tab-delimited sequence files and it is my understanding that a TSV file is a tab-delimited file so I'm not sure what the issue would be

ADD REPLY
2
Entering edit mode
5 months ago
Sareh ▴ 20

CSV file INPUT:

seq_name1,ACTACTACT
seq_name2,CGTCGTCGTCGT

FASTA file OUTPUT:

> seq_name1
ACTACTACT
> seq_name2
CGTCGTCGTCGT

METHOD with no packages

# OPEN IN & OUT files
in_handle = open(in_path, "r")
out_handle = open(out_path, "w")

# read in the file
for line in in_handle:
    line_tuple = line.strip("\n").split(',')  # for TSV just change this to split('\t')
    header = line_tuple[0] # column 1
    seq = line_tuple[1] # column 2

    # WRITE out the edited TSV
    out_handle.write("> {}\n{}\n".format(header, seq))

in_handle.close()
out_handle.close()

Be aware of any spaces before or after the comma in a CSV and within the seq name if you want to remove them.

ADD COMMENT
1
Entering edit mode

I'd recommend adding the rtrim/ltrim once you get the header and seq and also not adding the space after the > in the output. Otherwise, excellent solution!

ADD REPLY

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6