How to change id for a number of sequence
1
0
Entering edit mode
8.7 years ago

I have a list of sequence which id’s I want to change. I have their tab separated coordinate file having their old ids and new id’s (my id’s).

For example, I have a sequence like

>Aquca_005_00546.1
KHMALQFAMNAMDELMKMCQMNEPLWIPNNSGTKEMLNMEEHAKMFPWLTNFKQQHSQVRTEATRDSAVVIMNSITLTDAFLDVNKWMDIFPSIISRAKTVQIISSGIAGHASGSLHLMYAELQVQSPLVPTREAHFLRYCQQNAEEGTWAIVDFPIDSFHDSLQYSFPRYRRRPSGCLIQDMPNGYSRVTWVEHAEVEDKPVHQIFNHFVNSGTAFGAQRWLAVLQQQC
>Aquca_014_00016.1
DGWKVLTFENGVEISKRTSASFHIFRSRWLLKSVSPQQFITVANAIDAAKQWDSDLVEAKYIKDLEDNLSIIRLRFGDGSKPLFKNREFIVYERRETMADGTLVVAVASLPKEIAAGLHPKGNNTIRGLLLQSGWVVEELGDDENSCMVTYVVQLDPAGWLPKFFVNRLNTKLVMIIDNLEKL

I want to change their original ids with my ids. For example, Aquca_005_00546.1 with RaAc00546A and Aquca_014_00016.1 with RaAc00016E. My tab separated file has

Original ids

Aquca_005_00546.1   
Aquca_014_00016.1

my ids

RaAc00546A
RaAc00016E

Original id's and my id's are in tab separated file aligned line by line (Aquca_005_00546.1 = RaAc00546A)

linux perl shell • 2.1k views
ADD COMMENT
0
Entering edit mode

As there is a perl tag, take a look at Bioperl module Bio::SeqIO. And also take a look at this(pure solution).

ADD REPLY
0
Entering edit mode
8.7 years ago

Quick (bio)python script tested for your limited sample data:

from Bio import SeqIO
import sys

def changeids(fasfile, iddict):
    outlist = []
    for seq_record in SeqIO.parse(fasfile, "fasta"):
        seq_record.description = ""
        seq_record.id = iddict[seq_record.id]
        outlist.append(seq_record)
    SeqIO.write(outlist, "adaptedIDs.fa", "fasta")

def extractdict(identifierfile):
    with open(identifierfile) as idfile:
        return({line.split('\t')[0] : line.strip().split('\t')[1] for line in idfile.readlines()})

iddict = extractdict(sys.argv[2])
changeids(sys.argv[1], iddict)

Save as changeids.py and execute as python changeids.py yourfas.fa yourids.txt

Expecting a file without header in yourids.txt with in column 1 the identifiers as now and column 2 the identifiers you want, and nothing else. Requires biopython.

ADD COMMENT

Login before adding your answer.

Traffic: 3520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6