Question

Beginner at Python- My task is to convert a FASTA file to a PHYLIP file (tab-delimited file) and cannot figure out how to do so

0

Entering edit mode

8.5 years ago

oki4 ▴ 10

My code so far:

temp_line = ''
out_lines = []
with open('dna.fasta.py', 'r') as f_in:
    text = f_in.read()
    text = text.split()
    print(text)
    for line in text:
        line = line.strip('\n')
        if line[0] == '>':
            temp_line = line.strip('>')
            out_lines.append(temp_line)
        else:
            out_lines.append(line)


print (out_lines)
print (temp_line)
with open('dna.phylip.py', 'w') as file_out:
    file_out.write('\n'.join(out_lines))

My Input file: dna.fasta.py

>Human
AGCATGCATCGATCGATCGACTAGCTAGCG
>Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
>Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

My output file so far: dna.phylip file

Human
AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

Correct Format of the output phylip file/ What it should look like:

Human     AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp     GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla     TGTGTCGATCTCGAGCTGAGTCGTCTATCA

I have no idea how to remove the '/n' and add '/t', I've tried the strip and split function (to remove the new line), to add the tab, I've tried joining '/t' but my methods haven't worked. I don't what I'm doing wrong.

python fasta • 4.1k views

ADD COMMENT • link updated 8.5 years ago by Zaag ▴ 870 • written 8.5 years ago by oki4 ▴ 10

score 1 · Answer 1 · 2016-12-01

From your program, it looks like you have not properly understood the fasta format.

It could be

>1:id bla bla | bla
ATTATTGTTG
ATTATTGT
ATGGAT
ATTAGGGAGGGGATTA

>2:id bla bla | bla
AGGATTGCGGCTAGG

If you are not hesitant to use BioPython:

import sys
from Bio import SeqIO

for record in SeqIO.parse(sys.argv[1],"fasta"):
        print strrecord.id) + "\t" + record.seq

The text = text.split() does not work with fasta. You need to look for line starts with ">" and then the following lines should be concatenated until you see another line that starts with ">". You can search for how to handle fasta with python in google or in this site.

score 1 · Answer 2 · 2016-12-02

1

Entering edit mode

8.5 years ago

Brian Bushnell 20k

You can do this with the BBMap package like this:

reformat.sh in=file.fasta out=file.oneline

ADD COMMENT • link 8.5 years ago by Brian Bushnell 20k

score 0 · Answer 3 · 2016-12-02

0

Entering edit mode

8.5 years ago

lakhujanivijay 5.9k

If the task matters more than the code/language, here is a perl one liner:

perl -ne 'if($_=~s/^>//){chomp($_), print $_} else{print "\t".$_}' file.fa > out.fa

file.fa

>Human
AGCATGCATCGATCGATCGACTAGCTAGCG
>Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
>Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

out.fa

Human     AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp     GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla     TGTGTCGATCTCGAGCTGAGTCGTCTATCA

PS: Though, I love Python

ADD COMMENT • link 8.5 years ago by lakhujanivijay 5.9k

0

Entering edit mode

It does not work for multi lined fasta files also does not work if there are different words in fasta header like the one below:

>1:id bla bla | bla
ATTATTGTTG
ATTATTGT
ATGGAT
ATTAGGGAGGGGATTA

>2:id bla bla | bla
AGGATTGCGGCTAGG

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

0

Entering edit mode

Hi Goutham,

Thanks for pointing that out :)

Fixed the problems, here is the new one liner:

perl -ne 'if($_=~s/^>//){chomp($_), print "\n".$_." "} else{chomp($_),print $_}' file.fa

I hope it is helpful for the person who posted the question.

ADD REPLY • link 8.5 years ago by lakhujanivijay 5.9k

score 0 · Answer 4 · 2016-12-02

0

Entering edit mode

8.5 years ago

Zaag ▴ 870

with open('dna.fasta.py', 'r') as f, open('dna.phylip.py', 'w') as file_out:
    for line in f:
        line = line.strip()
        print(line)
        if line[0] == '>':
             file_out.write('{}\t'.format(line.strip('>')))
        else:
             file_out.write('{}\n'.format(line))

ADD COMMENT • link 8.5 years ago by Zaag ▴ 870

0

Entering edit mode

Thank you! Why do you have to include the {} (brackets) when writing into the output file?