Beginner at Python- My task is to convert a FASTA file to a PHYLIP file (tab-delimited file) and cannot figure out how to do so
4
0
Entering edit mode
8.1 years ago
oki4 ▴ 10

My code so far:

temp_line = ''
out_lines = []
with open('dna.fasta.py', 'r') as f_in:
    text = f_in.read()
    text = text.split()
    print(text)
    for line in text:
        line = line.strip('\n')
        if line[0] == '>':
            temp_line = line.strip('>')
            out_lines.append(temp_line)
        else:
            out_lines.append(line)


print (out_lines)
print (temp_line)
with open('dna.phylip.py', 'w') as file_out:
    file_out.write('\n'.join(out_lines))

My Input file: dna.fasta.py

>Human
AGCATGCATCGATCGATCGACTAGCTAGCG
>Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
>Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

My output file so far: dna.phylip file

Human
AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

Correct Format of the output phylip file/ What it should look like:

Human     AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp     GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla     TGTGTCGATCTCGAGCTGAGTCGTCTATCA

I have no idea how to remove the '/n' and add '/t', I've tried the strip and split function (to remove the new line), to add the tab, I've tried joining '/t' but my methods haven't worked. I don't what I'm doing wrong.

python fasta • 3.8k views
ADD COMMENT
1
Entering edit mode
8.1 years ago

From your program, it looks like you have not properly understood the fasta format.

It could be

>1:id bla bla | bla
ATTATTGTTG
ATTATTGT
ATGGAT
ATTAGGGAGGGGATTA

>2:id bla bla | bla
AGGATTGCGGCTAGG

If you are not hesitant to use BioPython:

import sys
from Bio import SeqIO

for record in SeqIO.parse(sys.argv[1],"fasta"):
        print strrecord.id) + "\t" + record.seq

The text = text.split() does not work with fasta. You need to look for line starts with ">" and then the following lines should be concatenated until you see another line that starts with ">". You can search for how to handle fasta with python in google or in this site.

ADD COMMENT
1
Entering edit mode
8.1 years ago

You can do this with the BBMap package like this:

reformat.sh in=file.fasta out=file.oneline

ADD COMMENT
0
Entering edit mode
8.1 years ago

If the task matters more than the code/language, here is a perl one liner:

perl -ne 'if($_=~s/^>//){chomp($_), print $_} else{print "\t".$_}' file.fa > out.fa

file.fa

>Human
AGCATGCATCGATCGATCGACTAGCTAGCG
>Chimp
GATATGTCGAGATCGTCAGCTCGATCAGCT
>Gorilla
TGTGTCGATCTCGAGCTGAGTCGTCTATCA

out.fa

Human     AGCATGCATCGATCGATCGACTAGCTAGCG
Chimp     GATATGTCGAGATCGTCAGCTCGATCAGCT
Gorilla     TGTGTCGATCTCGAGCTGAGTCGTCTATCA

PS: Though, I love Python

ADD COMMENT
0
Entering edit mode

It does not work for multi lined fasta files also does not work if there are different words in fasta header like the one below:

>1:id bla bla | bla
ATTATTGTTG
ATTATTGT
ATGGAT
ATTAGGGAGGGGATTA

>2:id bla bla | bla
AGGATTGCGGCTAGG
ADD REPLY
0
Entering edit mode

Hi Goutham,

Thanks for pointing that out :)

Fixed the problems, here is the new one liner:

perl -ne 'if($_=~s/^>//){chomp($_), print "\n".$_." "} else{chomp($_),print $_}' file.fa

I hope it is helpful for the person who posted the question.

ADD REPLY
0
Entering edit mode
8.1 years ago
Zaag ▴ 870
with open('dna.fasta.py', 'r') as f, open('dna.phylip.py', 'w') as file_out:
    for line in f:
        line = line.strip()
        print(line)
        if line[0] == '>':
             file_out.write('{}\t'.format(line.strip('>')))
        else:
             file_out.write('{}\n'.format(line))
ADD COMMENT
0
Entering edit mode

Thank you! Why do you have to include the {} (brackets) when writing into the output file?

ADD REPLY
0
Entering edit mode

https://docs.python.org/3/library/string.html

You could just print the variable with a tab or newline.

ADD REPLY

Login before adding your answer.

Traffic: 3757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6