Remove part of a header in a fasta file
3
0
Entering edit mode
2.1 years ago
kcl58759 • 0

Hi I need help writing a command to remove part of a header from my scaffold fasta file. I have headers that look like

>scaffold3247|size3454
TTATATAACTAATTAGATAAAATAGCTAATAATAAAAGCTTCTATATAACTAGCCTTCTTTTAATCTATATAATAAGCTTAGCTAATAAAAAGGCCCACT
TTTTTTTCCA

>scaffold11172|size823
GCTCAGCATGCCGTTGCCAACGCCGCGGGCGCTCATTTGCTGCAATCCAGCCGCCTTATTCCTGCTGCTGTCCTTGAGAGCCACGAGCCGGCCACCGTTG
ACAAACGTCTGGAACCGTAACCCAGACTCAGGCCCTTTGTAAGGCAGAGGCAGGAGCATGTTGACACTCCCGGCTGCGAAAAGATCACCACCAACAGCGT
CTTGACCATCGTGAGGCCCCAGC

and i need to get rid of the |size part

so

>scaffold3247
TTATATAACTAATTAGATAAAATAGCTAATAATAAAAGCTTCTATATAACTAGCCTTCTTTTAATCTATATAATAAGCTTAGCTAATAAAAAGGCCCACT
TTTTTTTCCA

>scaffold111
GCTCAGCATGCCGTTGCCAACGCCGCGGGCGCTCATTTGCTGCAATCCAGCCGCCTTATTCCTGCTGCTGTCCTTGAGAGCCACGAGCCGGCCACCGTTG
ACAAACGTCTGGAACCGTAACCCAGACTCAGGCCCTTTGTAAGGCAGAGGCAGGAGCATGTTGACACTCCCGGCTGCGAAAAGATCACCACCAACAGCGT
CTTGACCATCGTGAGGCCCCAGC

I am a novice at this so I am sure there is a way to use awk or sed but I am quite lost! Any help would be greatly appreciated!

fasta • 1.1k views
ADD COMMENT
0
Entering edit mode

I have fasta sequences example

>scaffold_71454_ 
TTATATAACTAATTAGATAAAATAGCTAATAATAAAAGCTTCTATATAACTAGCCTTCTTTTAATCTATATAATAAGCTTAGCTAATAAAAAG
>scaffold_72823_ 
GCTCAGCATGCCGTTGCCAACGCCGCGGGCGCTCATTTGCTGCAATCCAGCCGCCTTATTCCTGCTGCTGTCCTTGAGAGCCAC

How to clean '_' at the end of header line?

ADD REPLY
0
Entering edit mode

On on hand this should be a new question, on the other, formatting fasta headers is about the common question here. You might either find the solution or learn some regular expressions in sed. Hint: s/_$// in the above command should do the trick.

ADD REPLY
3
Entering edit mode
2.1 years ago
iraun 6.2k

A simple cut command could do it:

cut -d'|' -f1 input.fa > output.fa
ADD COMMENT
1
Entering edit mode
2.1 years ago
liorglic ★ 1.4k

Or you could use sed: sed 's/|.*//' input.fa > output.fa

ADD COMMENT
0
Entering edit mode
12 weeks ago
bk11 ★ 3.0k

Or if you would like to use awk:-

awk '/^>/{gsub(/\|size[0-9]+/,"",$0)}1' input.fasta >output.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6