Entering edit mode
2.2 years ago
kcl58759
•
0
Hi I need help writing a command to remove part of a header from my scaffold fasta file. I have headers that look like
>scaffold3247|size3454
TTATATAACTAATTAGATAAAATAGCTAATAATAAAAGCTTCTATATAACTAGCCTTCTTTTAATCTATATAATAAGCTTAGCTAATAAAAAGGCCCACT
TTTTTTTCCA
>scaffold11172|size823
GCTCAGCATGCCGTTGCCAACGCCGCGGGCGCTCATTTGCTGCAATCCAGCCGCCTTATTCCTGCTGCTGTCCTTGAGAGCCACGAGCCGGCCACCGTTG
ACAAACGTCTGGAACCGTAACCCAGACTCAGGCCCTTTGTAAGGCAGAGGCAGGAGCATGTTGACACTCCCGGCTGCGAAAAGATCACCACCAACAGCGT
CTTGACCATCGTGAGGCCCCAGC
and i need to get rid of the |size part
so
>scaffold3247
TTATATAACTAATTAGATAAAATAGCTAATAATAAAAGCTTCTATATAACTAGCCTTCTTTTAATCTATATAATAAGCTTAGCTAATAAAAAGGCCCACT
TTTTTTTCCA
>scaffold111
GCTCAGCATGCCGTTGCCAACGCCGCGGGCGCTCATTTGCTGCAATCCAGCCGCCTTATTCCTGCTGCTGTCCTTGAGAGCCACGAGCCGGCCACCGTTG
ACAAACGTCTGGAACCGTAACCCAGACTCAGGCCCTTTGTAAGGCAGAGGCAGGAGCATGTTGACACTCCCGGCTGCGAAAAGATCACCACCAACAGCGT
CTTGACCATCGTGAGGCCCCAGC
I am a novice at this so I am sure there is a way to use awk or sed but I am quite lost! Any help would be greatly appreciated!
I have fasta sequences example
How to clean '_' at the end of header line?
On on hand this should be a new question, on the other, formatting fasta headers is about the common question here. You might either find the solution or learn some regular expressions in sed. Hint:
s/_$//
in the above command should do the trick.