Question

I want to rename sequence headers

0

Entering edit mode

17 months ago

Riyad • 0

I have assembled transcript files with thousands of sequences with headers as like:

>TRINITY_DN50_c0_g2_i1 len=1961 path=[0:0-1960]
>TRINITY_DN59_c0_g1_i2 len=1961 path=[0:0-1960]

But, I want to rename them into as like:

>TRINITY_1
>TRINITY_2

Just all sequences will retain with TRINITY adding chronological number. Total number sequences are 40000

Fasta • 2.0k views

ADD COMMENT • link updated 17 months ago by Hugo ▴ 380 • written 17 months ago by Riyad • 0

0

Entering edit mode

what file format ?
what programming language?
Have you stored the information in the header in a separate location?

ADD REPLY • link 17 months ago by LauferVA 4.5k

0

Entering edit mode

In addition to what Mensur said, I would also state that renaming is not recommended because the string carries meaning. You will, for example, not be able to extract the longest isoform per gene from the edited file, and it will make reproducing subsequent analysis harder. Most tools should be able to deal with the Trinity identifiers. Unless a tool definitely does not support them, I would leave them as they are.

ADD REPLY • link 17 months ago by Michael 55k

0

Entering edit mode

Thanks@ Michael I appreciate this suggestions.

ADD REPLY • link 17 months ago by Riyad • 0

1

Entering edit mode

17 months ago

Mark ★ 1.6k

Use seqkit replace, assuming your file name is trinity.fasta:

seqkit replace trinity.fasta -p "(.+)" -r "TRINITY_{nr}" > trinity.renamed.fasta

Where:

-p "(.+)" is the match pattern to match the whole header text
-r "TRINITY_{nr}" is the replacement pattern, where {nr} adds the record number.

See https://bioinf.shenwei.me/seqkit/usage/#replace for more information

ADD COMMENT • link 17 months ago by Mark ★ 1.6k

0

Entering edit mode

Thanks@Mark Its working now with seqkit nicely. ....

ADD REPLY • link 17 months ago by Riyad • 0

0

Entering edit mode

Please mark the answer as correct.

ADD REPLY • link 17 months ago by Mark ★ 1.6k

1

Entering edit mode

17 months ago

benformatics 4.1k

R version

library(Biostrings)  
fa <- readDNAStringSet('your.fasta')
names(fa) <- paste0('TRINITY_',seq(fa))
writeXStringSet(fa,'your_new.fasta',format='fasta')

ADD COMMENT • link updated 17 months ago by Ram 44k • written 17 months ago by benformatics 4.1k

0

Entering edit mode

17 months ago

Hugo ▴ 380

You can also use SEDA (https://www.sing-group.org/seda). Specifically, you would use the "Rename header" operation first, to keep the "TRINITY" part of the headers using the "Multipart header" rename type.

enter image description here

Then, you would use the "Rename header" again, but this time with "Add prefix/suffix" rename type to add the indexes.

enter image description here

We will soon release a new SEDA version that comes with a CLI.

ADD COMMENT • link 17 months ago by Hugo ▴ 380

score 3 · Accepted Answer · 2023-07-09

3

Entering edit mode

17 months ago

Mensur Dlakic ★ 28k

Many posters think that their problems are unique, but in most cases that's not true. Yours, in particular, is one of most frequently discussed problems. That means that searching for "rename fasta header" from the main page will give you numerous solutions.

https://www.biostars.org/post/search/?query=rename+fasta+header

ADD COMMENT • link 17 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thank@ Mensur

ADD REPLY • link 17 months ago by Riyad • 0