Dear Friends, Hi (I am not native in English, So be ready for some . . . )
I have a Transdecoder output .pep file which the head
of it is as follow :
>TRINITY_DN100001_c0_g2::TRINITY_DN100001_c0_g2_i1::g.52916::m.52916 TRINITY_DN100001_c0_g2::TRINITY_DN100001_c0_g2_i1::g.52916 ORF type:3prime_partial len:142 (+) TRINITY_DN100001_c0_g2_i1:180-602(+)
MEEGEQLQLNRGVRHSQDRCSGEQIKTRAVRATPSTLSSTSRGINLKTFWHKGATGTTVK
IVLQEKHRRACVYSGKTYSHGEVWHPVLRPHRLLECILCTCKDGKQECRKITCPSEYPCQ
YPEKPEGKCCKTCPETKEETN
>TRINITY_DN100001_c0_g2::TRINITY_DN100001_c0_g2_i2::g.52918::m.52918 TRINITY_DN100001_c0_g2::TRINITY_DN100001_c0_g2_i2::g.52918 ORF type:complete len:324 (+) TRINITY_DN100001_c0_g2_i2:252-1223(+)
MKHLLFFFSFFLYFTSEAEAPRPRKTLETFCTFKEKRYNPGDSWHPYLEPHGFMFCIRCT
CAETGHVNCNSIKCPVLQCENPVIDSQQCCPRCAAEPKSPVGLRAPLKSCQYNGTIYQAG
EMFTSDELFPSRQPNQCVLCSCSNGNIFCGLRTCLKLTCSTPVSVPDTCCQLCKDHSDSP
ANPKYASMEEGEQLQLNRGVRHSQDRCSGEQIKTRAVRATPSTLSSTSRGINLKTFWHKG
ATGTTVKIVLQEKHRRACVYSGKTYSHGEVWHPVLRPHRLLECILCTCKDGKQECRKITC
PSEYPCQYPEKPEGKCCKTCPGM*
and I have a txt file containing the list of Trinity IDs I want to collect their results from that original .pep file I have mentioned above.
I need the collect all the line related to each IDs I have put in my list (as you can see each IDs has been repeated several times in its result lines, and the number of lines are different with each IDs and the end of the results are different - sometimes it ends to * and sometime does not).
Would you please help me about a command line/program script that can accept a list.txt file for this purpose?
~ Best
You should also give us an example of the IDs file so we can be more confident in the response. You can whip up a bioawk script that does this. Or, modify this so the comparison is a regex instead of an
exists
.Dear Ram, hi and Thanks, It would be as :
Like I said in my previous comment, you're going to need to use bioawk or modify my script. Just make sure your regular expression accommodates for a prefix, the
::
separators and a suffix.