Trimming of fasta file headers
2
0
Entering edit mode
4.8 years ago
2822462298 ▴ 120

I have a fasta file with the following format:

>BNY.1.2.t17987.mrna1 CDS=1-1065
seq...

How can I remove everything after ".mrna1" from the headers?

fasta RNA-Seq RNA transcriptome • 1.3k views
ADD COMMENT
1
Entering edit mode
4.8 years ago
tshtatland ▴ 190

Use this perl one-liner:

echo ">BNY.1.2.t17987.mrna1 CDS=1-1065\nACTG" | \
    perl -lpe '
s{\s.*}{};
s{[.][^.]+\z}{};
'

In the example based on the header you showed, it prints:

>BNY.1.2.t17987
ACTG

Here, these command line flags are used:

-e: tells the Perl interpreter to look for the code in-line instead of in a file;
-p: loop over each line of the input file or, if none, STDIN; assign the line to $_, execute the code provided, then print $_;
-l: (lowercase "L"): strip the input line separator (default "\n" on *NIX), then add it during printing.

The code does 2 substitutions, each by default done on $_. The first one replaces everything after the first whitespace with an empty string, which trims " CDS=1-1065". The second one trims everything after and including the last "." until the end of the line ("\z").

ADD COMMENT
0
Entering edit mode

Thank you for your reply! That's really helpful

ADD REPLY
0
Entering edit mode
4.8 years ago

cut -f1 -d' ' myfasta.fa

ADD COMMENT
0
Entering edit mode

Hi, Thank you for your reply. I may need to remove '.mrna1' as well

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6