Entering edit mode
3.5 years ago
Anisur Rahman
▴
80
Hi, I have a multifasta file like the example below:
>hCoV-19/Bangladesh/BCSIR-NILMRC-523/2021|EPI_ISL_1034736|2021-01-22
CCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGG
CGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTC
>hCoV-19/Bangladesh/BCSIR-NILMRC-515/2020|EPI_ISL_1034763|2020-12-24
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACG
AATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGT
>hCoV-19/Bangladesh/BCSIR-NILMRC-517/2020|EPI_ISL_1035809|2020-12-24
GGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTG
CTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCG
CTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGAT
Here, you can see that each sequence has an id number (as like EPI_ISL_1034736) in the header. I want to keep only the id number in the header. The resulted file will be as like below:
>EPI_ISL_1034736
CCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGG
CGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTC
>EPI_ISL_1034763
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACG
AATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGT
>EPI_ISL_1035809
GGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTG
CTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCG
CTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGAT
Can any of you help me to achieve this? I can use the seqkit replace
tool to rename with my own strings. but in this case, I need to keep the sequence id in the header.
See if solutions here help: Fasta header trimming