Fasta file manipulation
1
0
Entering edit mode
10.5 years ago
GP ▴ 10

Hi All,

I want to extract the sequences that don't have * (special character/stop codon) in it from a fasta file that I have . Is there any one liner or easy way to do that t'h the command line (mac os) or if anyone could redirect me to the similar post on this forum, that would be very helpful.

Thanks!!

sequence • 2.3k views
ADD COMMENT
2
Entering edit mode
10.5 years ago

linearize, grep, convert back to fasta:

awk '/^>/{printf("\n%s\t",$0);next;} {printf("%s",$0);} END {printf("\n");}' file.fa |\
awk -F '\t' '!($2 ~ /\*/)' |\
tr "\t" "\n"
ADD COMMENT
0
Entering edit mode

Thanks for the fast response! It works perfectly :) and btw, what I will have to change in this command in order to print the sequences that has * in it.

ADD REPLY
0
Entering edit mode

remove the ! in the second awk cmd

ADD REPLY
0
Entering edit mode

Great, thanks again Pierre!!

ADD REPLY

Login before adding your answer.

Traffic: 2565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6