extract sequence problem
1
0
Entering edit mode
5.2 years ago

Hi, I have a fasta file like this

>TRINITY_DN100000_c1_g1::TRINITY_DN100000_c1_g1_i3::g.3039::m.3039 TRINITY_DN100000_c1_g1::TRINITY_DN100000_c1_g1_i3::g.3039  ORF type:complete len:100 (-) TRINITY_DN100000_c1_g1_i3:1027-1326(-)
MVWIKFRGLHRVLTSTPLVKSGKTPSQTWAFLDISVELIVFLFLNVHKSPMPHFKIYSEA
FSEEWSLLWLQYSRHLIQKPKPWQIKIELLHLCCCNRLC*
>TRINITY_DN100000_c1_g6::TRINITY_DN100000_c1_g6_i2::g.84365::m.84365 TRINITY_DN100000_c1_g6::TRINITY_DN100000_c1_g6_i2::g.84365  ORF type:complete len:112 (-) TRINITY_DN100000_c1_g6_i2:379-714(-)
MEMMQEIIPFAREMLSARPSKGTMKVYLVGGTFAVLGIVSGMVEAACSLFPEQEESTLTK
LMEDCLTVTAQNQEPQTFIPEDDEQDAEMEAKAKDLPMFRQRRMSFRAHAS*

If I want to simplify it like this:

>TRINITY_DN100000_c1_g1_i3
MVWIKFRGLHRVLTSTPLVKSGKTPSQTWAFLDISVELIVFLFLNVHKSPMPHFKIYSEA
FSEEWSLLWLQYSRHLIQKPKPWQIKIELLHLCCCNRLC*
>TRINITY_DN100000_c1_g6_i2
MEMMQEIIPFAREMLSARPSKGTMKVYLVGGTFAVLGIVSGMVEAACSLFPEQEESTLTK
LMEDCLTVTAQNQEPQTFIPEDDEQDAEMEAKAKDLPMFRQRRMSFRAHAS*

what command should I use? I know python can easily solve this, but if there is any simple command could do that..

RNA-Seq • 862 views
ADD COMMENT
0
Entering edit mode
$ sed '/^>/ s/.*::\(.*\)::.*::.*::.*::.*/>\1/g' test.fa   

>TRINITY_DN100000_c1_g1_i3
MVWIKFRGLHRVLTSTPLVKSGKTPSQTWAFLDISVELIVFLFLNVHKSPMPHFKIYSEA
FSEEWSLLWLQYSRHLIQKPKPWQIKIELLHLCCCNRLC*
>TRINITY_DN100000_c1_g6_i2
MEMMQEIIPFAREMLSARPSKGTMKVYLVGGTFAVLGIVSGMVEAACSLFPEQEESTLTK
LMEDCLTVTAQNQEPQTFIPEDDEQDAEMEAKAKDLPMFRQRRMSFRAHAS*
ADD REPLY
0
Entering edit mode
5.2 years ago
stcatpang ▴ 60

try 's/::.*//' input > output

ADD COMMENT
0
Entering edit mode

Hi, thanks, so it is sed 's/::.*//' input > output , it can keep the first header ">TRINITY_DN100000_c1_g1", if we want to keep the second header ">TRINITY_DN100000_c1_g1_i3", if you have any other modified?

Thank you!!!

ADD REPLY

Login before adding your answer.

Traffic: 2771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6