Adding Fasta unique identifiers
2
0
Entering edit mode
7.7 years ago
Rose • 0

Hi, I would like to introduce unique identifiers to my Fasta files from:

>Ricinus_communis_APK1A
>Ricinus_communis_APK1B
>Ricinus_communis_APK1C

to

>1 Ricinus_communis_APK1A
>2 Ricinus_communis_APK1B
>3 Ricinus_communis_APK1C
sequence fasta • 2.5k views
ADD COMMENT
1
Entering edit mode
7.7 years ago
frcamacho ▴ 210

A simple way, is to just iterate through the fasta file using Python and add the headers to a dict, if you find a match while iterating to the key then you can just add another field. Something like this.

from Bio import SeqIO
import os 

fastadir = ""
fastafile = "input.fa"
outfile = "ouput-editedIDs.fa"

os.chdir(fastadir) 
headerName= {} 
with open(outfile, 'a') as newFastaFile:
    for record in SeqIO.parse(open(fastafile, 'rU'), 'fasta'):
        record_id = record.id
        record_seq = record.seq 
        if record_id not in headerName: 
            headerName[record_id]= 0
        else:
            headerName[record_id]= headerName[record_id]+1
            print (headerName)
            record_id = record_id+ " "+str(headerName[record_id]) # if the header is in then we have duplicated fasta headers 
        record.description = "" 
        row = str(">"+ record_id+'\n'+ record_seq + '\n')

        newFastaFile.write(row)
newFastaFile.close()
print ("FINISHED WRITING TO FILE ")
ADD COMMENT
0
Entering edit mode
7.7 years ago
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa
ADD COMMENT
0
Entering edit mode

I works, but the file is not modified. Please how to save the modifications

ADD REPLY
1
Entering edit mode
awk '/^>/ {printf(">%d %s\n",++N,substr($0,2));next;} {print;}' input.fa > output.fa
ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6