Entering edit mode
2.4 years ago
pinn
▴
210
Hi,
I had 1000's of sequences in a fasta file. I'd like to delete
the underscore
and number
(_1,_2,_34297...)
. at the end of the fasta headers ?
Original file
>XP_034398789.1_1
>XP_034398430.1_2
....
....
....
>XP_034381508.1_34297
>XP_034419373.1_34330
>XP_034419129.1_34363
>XP_034385161.1_38667
Expected output
>XP_034398789.1
>XP_034398430.1
....
....
....
>XP_034381508.1
>XP_034419373.1
>XP_034419129.1
Using , cut, I tried on sample data. It deletes the ">XP_"
What I'd be cut command for deleting the characters/numbers after the XP_034398789.1_1
cut -f2 -d'_' TEXT.fa.fa | sed '15~20s/^/>/'
034419421.1
034380977.1
034381532.1
cut -d_ -f1,2 TEXT.fa.fa
>XP_034398789.1
>XP_034398430.1
....
....
....
>XP_034381508.1
>XP_034419373.1
>XP_034419129.1
There are plenty of fasta-header-editing posts on the forum (I'm sure you would have seen a few in the years you've been here), and "delete everything after second underscore" will produce a ton of Google results. Did you try searching anywhere before creating a new post?