Question

Problems With Sequence Parts In Biopython

1

Entering edit mode

13.1 years ago

elmagodelabahia ▴ 60

Hi there!

I'm starting my work with 454 pyrosequencing amplicons. I am using Biophyton making my own scripts.

I have some questions: how I can search for a determinate sequence within a longer sequence? (e.g. find 'AGCT' within 'AAAGGGagctCCCTTT') I have split bigger sequence into each nucleotide with str(sequence.seq) command, but I am stuck at this point.

The other question is how I can make a counter properly. I have written down one, but instead of getting a proper count, I get a non-sense result. This is a bigger problem because I want to write down 3 files with only one program.

Here is my script:

from Bio import SeqIO

secuencias=SeqIO.parse("/Users/imac/Desktop/Programas_Python_glaciares/documento_prueba.fasta", "fasta")

archivo1=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo1.fasta", "w")
archivo2=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo2.fasta", "w")
archivo3=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo3.fasta", "w")


x=0  
y=0 
z=0
for linea in secuencias:

    dato=linea.description.split(" ")
    seqs=str(linea.seq)

    if "XXXXXXXXXXXXXXXXXX" in seqs:

        archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=x+1 
        y=0 
        z=0
    archivo1.write(str(x))


    if "YYYYYYYYYYYYYYYYYY" in seqs:

        archivo2.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
        x=0
        y=y+1
        z=0
    archivo2.write(str(y))


    if "ZZZZZZZZZZZZZZZZZZ" in seqs:

        archivo3.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

        x=0
        y=0
        z=z+1
    archivo3.write(str(z))

archivo1.close()
archivo2.close()
archivo3.close()

Sequences in the script are for a test file.

Thanks a lot

python biopython sequence split • 3.1k views

ADD COMMENT • link updated 10.2 years ago by Biostar 20 • written 13.1 years ago by elmagodelabahia ▴ 60

2

Entering edit mode

Why are you doing this:

archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")

Surely this would be simpler (and almost the same - this will preserve the full FASTA description):

SeqIO.write(linea, archivo1, "fasta")

Note that linea is a SeqRecord object.

ADD REPLY • link 13.1 years ago by Peter 6.0k

0

Entering edit mode

Can you edit your post to include all code in the code block?

ADD REPLY • link 13.1 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

if you only want to look at the EXACT matching, you can use string find function e.g

 def find_all(src, s):
    c=0; beg=0; slen=len(s)
    while1: 
        start=src.find(s, beg)
        if beg==-1: break
        c+=1; start+=slen
    return c

ADD REPLY • link 13.1 years ago by jingtao09 ▴ 110

score 2 · Answer 1 · 2012-06-13

2

Entering edit mode

13.1 years ago

Asaf 10k

Why do you set the counters to zero on each iteration? e.g.

   x=0
   y=0
   z=z+1

That's the reason your counter don't work. Other than that 'XXX' in seqs should work

ADD COMMENT • link 13.1 years ago by Asaf 10k