I have a fasta file with the following format:
>BNY.1.2.t17987.mrna1 CDS=1-1065
seq...
How can I remove everything after ".mrna1" from the headers?
I have a fasta file with the following format:
>BNY.1.2.t17987.mrna1 CDS=1-1065
seq...
How can I remove everything after ".mrna1" from the headers?
Use this perl one-liner:
echo ">BNY.1.2.t17987.mrna1 CDS=1-1065\nACTG" | \
perl -lpe '
s{\s.*}{};
s{[.][^.]+\z}{};
'
In the example based on the header you showed, it prints:
>BNY.1.2.t17987
ACTG
Here, these command line flags are used:
-e
: tells the Perl interpreter to look for the code in-line instead of in a file;
-p
: loop over each line of the input file or, if none, STDIN; assign the line to $_
, execute the code provided, then print $_;
-l
: (lowercase "L"): strip the input line separator (default "\n"
on *NIX), then add it during printing.
The code does 2 substitutions, each by default done on $_
. The first one replaces everything after the first whitespace with an empty string, which trims " CDS=1-1065"
. The second one trims everything after and including the last "."
until the end of the line ("\z"
).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you for your reply! That's really helpful