Batch rename protein fasta headers
5
0
Entering edit mode
4.7 years ago

Hey guys,

I have tons of protein multi-fasta files and I would like to append the name of the file to the fasta-headers. For example, for a input file one.txt with the headers

>1
ATGC...
>2
ATGCAT...

I would like to have the output

>one_1
ATGC...
>one_2
ATGCAT...

I use bbrename for DNA sequences, but doesn't work for protein files. Thanks!

sequence • 2.3k views
ADD COMMENT
0
Entering edit mode

your example files also don't really looks to be protein either ...

ADD REPLY
1
Entering edit mode
4.7 years ago
 awk '/^>/ {printf(">%s_%s\n",substr(FILENAME,1,length(FILENAME)-3),substr($1,2));next;} {print}' *.txt
ADD COMMENT
1
Entering edit mode
4.7 years ago
Joe 21k

As easy as:

for file in /path/to/files/*.fasta ; do
    sed "s/>/>$(basename $file .fasta)/gi" $file
done

You can tweak it if you want to keep the extension or whatever...

ADD COMMENT
0
Entering edit mode
4.7 years ago
Hood ▴ 40

You could use simple python script like:

from Bio import SeqIO

with open("one.txt", "r") as input:
    with open("output_filename.fasta", "w") as output:
        for record in SeqIO.parse(input, "fasta"):
            record.id = f"one_{record.id}"
            record.description = ""
            SeqIO.write(record, output, "fasta")

This require installation of biopython.

ADD COMMENT
0
Entering edit mode

Thanks for the reply. The thing is that I have multiple files to use as input. So, I guess using a loop to create a renamed output for each file would be better. Do you think you can help me with this? Usually for DNA sequences, I use

for F in *.fasta; do N=$(basename $F .fasta) ; bbrename.sh in=$F out=${N}_mod.fasta prefix=$F addprefix=t ; done

I need to find an alternative that works with multi-fasta protein files as input

ADD REPLY
0
Entering edit mode
4.7 years ago
Fatima ▴ 1000
for f in `ls *.fasta | sed 's/.fasta//g' `; do sed "/^>/ s/.*/&_$f/" "$f.fasta" >  "$f_new.fasta" ; done

.

ADD COMMENT
0
Entering edit mode
4.7 years ago

Using seqkit

seqkit replace -p '(.+)' -r 'one_$1' Filename.fasta

ADD COMMENT

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6