Entering edit mode
2.5 years ago
Percy
•
0
How do I edit and remove similar characters from a fasta file headers before the last underscore and replace them with a numerical value provided they are similar . for example a section of my headers look like this :
>NODE_23_length_59792_cov_23.204747_1
>NODE_23_length_59792_cov_23.204747_2
>NODE_23_length_59792_cov_23.204747_3
>NODE_23_length_59792_cov_23.204747_4
>NODE_23_length_59792_cov_23.204747_5
>NODE_23_length_59792_cov_23.204747_6
>NODE_23_length_59792_cov_23.204747_7
>NODE_23_length_59792_cov_23.204747_8
the desired output is :
>1_1
>1_2
>1_3
>1_4
>1_5
>1_6
>1_7
>1_8
Just curious: why is the replacement better than what you already have? There is important information in that header that will be completely lost, and saving a couple of characters in disk space doesn't seem worth it.
dup of Renaming Entries In A Fasta File
Up to what character they will not be similar? for eg. after node, after length, after cov?
the last digit after the last underscore