Question

Can't make program in Python to read FASTA

0

Entering edit mode

6.1 years ago

hdinis09 • 0

My professor wants us to make a program that analyses triplets in sequences. The first step is to open and read the actual sequence and he provides these codes.

def read_FASTA(fname):
begin = True
prots = {}
fil = open(fname, "rt")
lins = fil.readlines()
fil.close()
for lin in lins:
    slin = lin.strip()
    if slin[0] == '>':
        if begin == False:
            prots[pname] = seq
        seq = ""
        pname = slin[1:].strip()
        begin = False
    else:
        seq = seq + slin
prots[pname] = seq
return prots

---SECOND CODE---

    import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta’)
for prot in prots:
    print(prot)

After downloading the fasta file from Uniprot(one or multiple), these are my results:

def read_FASTA("proteinas.fasta"):
    begin = True
    prots = {}
    fil = open("proteinas.fasta", "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

--Second result--

import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta')
for prot in prots:
    print(prot)

fasta python sequence • 5.5k views

ADD COMMENT • link updated 6.1 years ago by Dattatray Mongad ▴ 380 • written 6.1 years ago by hdinis09 • 0

0

Entering edit mode

I suspect this is just the tip of the iceberg, but at least part of your problem is that you are passing your file path with a mixture of quotes and backticks:

" = double quote
' = single quote
` = backtick

Do not mix these up. It should look something like:

function('C:\Path\to\file.fasta')

——

Just a pointer about using the forum, please don’t screenshot code/output, instead copy and paste the text and format it appropriately as I have done above.

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

But i did put in single quotes!

ADD REPLY • link 6.1 years ago by hdinis09 • 0

0

Entering edit mode

Hummm.. perhaps it was just that screenshot making it look odd then (another reason to copy the raw text).

And what was the error you were getting again?

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

Hello hdinis09 ,

you forgot to tell us, what your question/problem is :)

fin swimmer

ADD REPLY • link 6.1 years ago by finswimmer 16k

h.mon · Answer 1 · 2018-11-26

3

Entering edit mode

6.1 years ago

Dattatray Mongad ▴ 380

use biopython:

from Bio import SeqIO
     for records in SeqIO.Parse("fastaFileName","fasta"):
         print( records.id )
         print( records.seq )

ADD COMMENT • link updated 6.1 years ago by h.mon 35k • written 6.1 years ago by Dattatray Mongad ▴ 380

0

Entering edit mode

This is the better suggestion, but as the task is an assignment with code given specifically, I'm guessing this isn't an option.

ADD REPLY • link 6.1 years ago by Joe 21k

score 1 · Answer 2 · 2018-11-26

You haven't shown us the project structure, and rfasta isn't an existing package, so I'm guessing the script itself is called rfasta and you're trying to import it locally?

The code works for me (after a little re-indentation compared to your post which was probably lost in translation).

Given the following:

Input file

>mutant
GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>gorrila
GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC
>chimpanze
GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>human
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>olive
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC

The 'reader' code in a file called `rfasta.py` in the current working directory:

def read_FASTA(fname):
    begin = True
    prots = {}
    fil = open(fname, "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

(This isn't particularly elegant python IMO, but it works and is fine for an exercise).

Running the code in a local python interpreter:

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> file = './seqs.fa'
>>> import rfasta
>>> p = rfasta.read_FASTA(file)
>>> print(p)
{'olive': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC', 'gorrila': 'GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'chimpanze': 'GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'mutant': 'GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'human': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC'}

As you can see, the fasta file is parsed in to a dictionary without any problem.

I was running this on a Linux box, so the filepath syntax etc will be different as you're doing it on Windows, but the principle is the same.

Input file

The 'reader' code in a file called rfasta.py in the current working directory:

Running the code in a local python interpreter:

The 'reader' code in a file called `rfasta.py` in the current working directory: