I have assembled transcript files with thousands of sequences with headers as like:
>TRINITY_DN50_c0_g2_i1 len=1961 path=[0:0-1960]
>TRINITY_DN59_c0_g1_i2 len=1961 path=[0:0-1960]
But, I want to rename them into as like:
>TRINITY_1
>TRINITY_2
Just all sequences will retain with TRINITY adding chronological number. Total number sequences are 40000
what file format ?
what programming language?
Have you stored the information in the header in a separate location?
In addition to what Mensur said, I would also state that renaming is not recommended because the string carries meaning. You will, for example, not be able to extract the longest isoform per gene from the edited file, and it will make reproducing subsequent analysis harder. Most tools should be able to deal with the Trinity identifiers. Unless a tool definitely does not support them, I would leave them as they are.
Thanks@ Michael I appreciate this suggestions.