Hi there!
I'm starting my work with 454 pyrosequencing amplicons. I am using Biophyton making my own scripts.
I have some questions: how I can search for a determinate sequence within a longer sequence? (e.g. find 'AGCT' within 'AAAGGGagctCCCTTT') I have split bigger sequence into each nucleotide with str(sequence.seq) command, but I am stuck at this point.
The other question is how I can make a counter properly. I have written down one, but instead of getting a proper count, I get a non-sense result. This is a bigger problem because I want to write down 3 files with only one program.
Here is my script:
from Bio import SeqIO
secuencias=SeqIO.parse("/Users/imac/Desktop/Programas_Python_glaciares/documento_prueba.fasta", "fasta")
archivo1=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo1.fasta", "w")
archivo2=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo2.fasta", "w")
archivo3=open("/Users/imac/Desktop/Programas_Python_glaciares/archivo_nuevo3.fasta", "w")
x=0
y=0
z=0
for linea in secuencias:
dato=linea.description.split(" ")
seqs=str(linea.seq)
if "XXXXXXXXXXXXXXXXXX" in seqs:
archivo1.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
x=x+1
y=0
z=0
archivo1.write(str(x))
if "YYYYYYYYYYYYYYYYYY" in seqs:
archivo2.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
x=0
y=y+1
z=0
archivo2.write(str(y))
if "ZZZZZZZZZZZZZZZZZZ" in seqs:
archivo3.write(">" + str(dato[0]) + " " + str(dato[1]) + "\n" + str(linea.seq) + "\n")
x=0
y=0
z=z+1
archivo3.write(str(z))
archivo1.close()
archivo2.close()
archivo3.close()
Sequences in the script are for a test file.
Thanks a lot
Why are you doing this:
Surely this would be simpler (and almost the same - this will preserve the full FASTA description):
Note that linea is a SeqRecord object.
Can you edit your post to include all code in the code block?
if you only want to look at the EXACT matching, you can use string find function e.g