How To Add An Attribute To A Sequence Header (Of Fasta) Based On A Match Off Another Fasta?
1
1
Entering edit mode
11.1 years ago
microbeatic ▴ 80

I have a pair of fasta file. Both of them have common sequences and sequences that are not common. In the sequence header of sequences in Fasta#1 there is an extra element that is not present in Fasta#2. For example orfg2 in the first sequence.

I want to create a 3rd fasta element (orfg#) to sequences in Fasta#2 along with other sequence that is only present in Fasta#2.

Fasta #1

gi|386029701|ref|YP_005950476.1| putative terminase [Oligotropha carboxidovorans OM4]|orfg2|1576637:1577942 MSPTQRLYILARLNRDELEALHSDWQLFAHPHQWPPDLAANDKPWTTWLLIGGRGAGKTRAG gi|386029703|ref|YP_005950478.1| phage portal protein [Oligotropha carboxidovorans OM4]|orfg3|1578528:1579704 MLNRLKHFFAAPEAKASRTAQVLAFASGGRARWMPRDYATLAREGYLANAVVHRAVRLIAESAA gi|386029705|ref|YP_005950480.1| putative head maturation protease [Oligotropha carboxidovorans OM4]|orfg4|1580448:1580964 MHAPLPSASRVSFTGDGAVEGYASLFGAVDQARDMVMPGAFTQTLQSRGLRRIPMLFQHDPS gi|386029707|ref|YP_005950482.1| phage major capsid protein [Oligotropha carboxidovorans OM4]|orfg5|1581800:1583069 MDYDIQDHAPEHKSGISAARGEHDQIVQMFEEFKAANDERLAALGRRADVLLEEKVDRINGALD gi|386029711|ref|YP_005950486.1| hypothetical protein OCA4_c14650 [Oligotropha carboxidovorans OM4]|orfg6|1585092:1585662 MAAILLTPPESEPLSLSDAKAYLRVETDDDDAVIASLIAAARSHIEAMSGCAMLTQTWRLVRDCW

Fasta#2

gi|GI:386029701|ref|YP_005950476.1|putative terminase|Oligotropha carboxidovorans OM4 chromosome, complete genome.|1576637:1577942 MSPTQRLYILARLNRDELEALHSDWQLFAHPHQWPPDLAANDKPWTTWLLIGGRGAGKTRAG gi|GI:386029702|ref|YP_005950477.1|putative endonuclease|Oligotropha carboxidovorans OM4 chromosome, complete genome.|1578089:1578377 MDSTFFVYVLANCPRGVLYVGVTSDLSRRLSEHRVKLVAGFTATYGVTNLVHVETYASIIEARARE gi|GI:386029703|ref|YP_005950478.1|phage portal protein|Oligotropha carboxidovorans OM4 chromosome, complete genome.|1578528:1579704 MLNRLKHFFAAPEAKASRTAQVLAFASGGRARWMPRDYATLAREGYLANAVVHRAVRLIAESAA gi|GI:386029704|ref|YP_005950479.1|hypothetical protein|Oligotropha carboxidovorans OM4 chromosome, complete genome.|1579777:1580038 MAPVQRCSMKNTAPRPGKGAMIDLIRIFVERGDLAHLALFLWAAVASTAFLVTLRELAVASKRFD gi|GI:386029705|ref|YP_005950480.1|putative head maturation protease|Oligotropha carboxidovorans OM4 chromosome, complete genome.|1580448:1580964

Here is my code. But, it only adds attribute to first three matches and quits.

#! /usr/bin/python

from Bio import SeqIO,SeqFeature
import sys

Fasta1=SeqIO.parse(open(sys.argv[1],"rU"),"fasta")

Fasta2=SeqIO.parse(open(sys.argv[2],"rU"),"fasta")

for seq_hits in Fasta1:
    gid_hits=str(seq_hits.id.split("|")[1])
    orf=seq_hits.id.split("|")[5]
    print orf
    for seq_cluster in Fasta2:
        gid_cluster=seq_cluster.id.split("|")[1]
        gi_cluster=str(gid_cluster.split(":")[1])
        if gi_cluster == gid_hits:
            print (">%s|%s\n%s" % (seq_cluster.description,orf,seq_cluster.seq))
        else:
            break
biopython fasta • 2.9k views
ADD COMMENT
1
Entering edit mode

put this line:

Fasta2=SeqIO.parse(open(sys.argv[2],"rU"),"fasta")

inside the first loop

ADD REPLY
0
Entering edit mode

Ok. It worked. I also took out the last two lines. Thanks much.

ADD REPLY
0
Entering edit mode

I just realized that now it only prints the sequences with matches, and the one that doesn't match are not printed. Any suggestions??

ADD REPLY
0
Entering edit mode
11.1 years ago
microbeatic ▴ 80

With the help of @brentp this worked. Here is the corrected script.

#! /usr/bin/python
from Bio import SeqIO,SeqFeature
import sys

Fasta1=SeqIO.parse(open(sys.argv[1],"rU"),"fasta")

for seq_hits in Fasta1:
    gid_hits=str(seq_hits.id.split("|")[1])
    orf=seq_hits.id.split("|")[5]
    Fasta2=SeqIO.parse(open(sys.argv[2],"rU"),"fasta")
    for seq_cluster in Fasta2:
        gid_cluster=seq_cluster.id.split("|")[1]
        gi_cluster=str(gid_cluster.split(":")[1])
        if gi_cluster == gid_hits:
            print (">%s|%s\n%s" % (seq_cluster.description,orf,seq_cluster.seq))
ADD COMMENT

Login before adding your answer.

Traffic: 2326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6