Entering edit mode
3.0 years ago
sherafzalk769
▴
10
Hi All.
From this file, I want to select some proteins ID, with amino acid sequences. The question is how to do it in Linux to use grep command or any other suggestion, please.
>CALMACT00000025247|CALMACG00000015281|Seq_id=6096_quiver|type=cds
HGHPCLLNAPYLNCRSKNRGTQNFELPHNGLLTLFLDDCHALYLL*
>CALMACT00000001382|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MANKPEKPTENTMEEKEEKEKKESFGLHLDLLMDSMEETRFKKKYSELIEKLVHETHFND
VEVESLLIIYYKLAKENMKTEKYKEKSVTKEQFRDFLHCALDMTDDTLMDRIFVSLDRMP
GSTITMETYAHALSLMLRGTLEEKINFCFRIGIF*
>CALMACT00000001383|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MGDGLLGRDAMFYLLRNSLVSLSGEDDAEESVKDMIEVITKKLDVDRDGKISFQDYKQTV
LKQPALLEVFGQCLPSRGAVYTFSTTFSSNPNLKM*
>CALMACT00000001384|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MANKPEKPTENTMEEKEEKEKKESFGLHLDLLMDSMEETRFKKKYSELIEKLVHETHFND
VEVESLLIIYYKLAKENMKTEKYKEKSVTKEQFRDFLHCALDMTDDTLMDRIFVSLDRMP
GSTITMETYAHALSLMLRGTLEEKINFCFRVNNAQIRVYDTMGDGLLGRDAMFYLLRNSL
VSLSGEDDAEESVKDMIEVITKKLDVDRDGKISFQDYKQTVLKQPALLEVFGQCLPSRGA
VYTFSTTFSSNPNLKM*
>CALMACT00000001385|CALMACG00000000849|Name=efcab1|Seq_id=308_quiver|type=cds
MENASVLFKLESVVYDTMGDGLLGRDAMFYLLRNSLVSLSGEDDAEESVKDMIEVITKKL
DVDRDGKISFQDYKQTVLKQPALLEVFGQCLPSRGAVYTFSTTFSSNPNLKM*
>CALMACT00000016141|CALMACG00000009873|Name=mns1|Seq_id=308_quiver|type=cds
MAERYTESEFEEIPEATGQHELDHGLLIEAIQGYPYLYDMTNQLYKNVKKKAEAWEIIAG
LLDSTVQDCMKAWKSLRDRYVKEKNRCASGSEAPESIWRYFDSMHFYAKFTKPRKTHTRP
MKTTQSGKSSESSRPSSTMSMWSPVDEVVCEDETLTNTPTRSEEGPGPSRRKRRLSLDTP
ISSGKSAKRVENSFLEVAKEIIDKIDKKQHQMNPNKTFCDYLFTELEKLPEAEAKEKRRQ
ILLYLLEKLAEDAKLDQLSEQKRRMKMLQLRRDTERMMIERRQKHAEEMQLLMKIEDEAQ
AELEIKRKIVEEERHRMLGEHVKNLIGYLPKGILNVDDLPHLDKEVVEKVSPGYTKN*
>CALMACT00000007948|CALMACG00000004977|Seq_id=308_quiver|type=cds
MKNLRMEGKEIHQTGKGITQKGKLKKAKTMKDVECKCHYNCNNSISKEARHK
>CALMACT00000025994|CALMACG00000015721|Seq_id=308_quiver|type=cds
MLWAGDGTRELSCFNYRGVDRLTPPNHCVFVLSASVLFTWS*
>CALMACT00000003155|CALMACG00000002001|Seq_id=308_quiver|type=cds
PTQKTQTSLFTKHKLKFPLNTEEDISKFEDVLKSEEEFNAACDELARFGGSNIYNFIKRT
LTALLSNEQAKNYSLKGRKGKKPFEKLLLARLLICAAEKSLNTTQKAVEDAICSWLKRAP
ERLQGYRKKFP*
>CALMACT00000032398|CALMACG00000019612|Seq_id=308_quiver|type=cds
MHSPYSVDTQNNQPREENSTLRTSIYCSSLIHERHLFLITATAENPASYKYDVANEIKIL
EAFEVRFGWILNR*
>CALMACT00000035072|CALMACG00000021205|Name=r3hcc1l|Seq_id=428_quiver|type=cds
MAHHLMMIMFAAQKAVSLTSPIIKARPMTAASRATLNTAQRHDLKPAMKRPQTNLQTARR
LITTHLGKKSRISQEQSAQERNDLRTAREQKRLVKKNEQDAWEGNLRSSLN*
>CALMACT00000006722|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MRAKKGRNFASVQSPAHQMSEDIKTITEYALKSKTLEEIDIDIASYNLKPCCANVVREIM
DLTAFDNAVLSAQYSDIWKQERQFRITGTRCYSVYTFAKDNWSTMTRNFFWPKPFTSRYT
DHGIKYEKEALIKYTRSNNYKVVELGLVICKQLPWIAYSPDGVVMADGAPTRLVEIKCPY
DGILPADNLKVLT*
>CALMACT00000006723|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MSDKSVFLTLSSVFKYLCANTDSRCITEGEEILNANHIILMGVVSDSEEYVELKGRNFAS
VQSPAHQMSEDIKTITEYALKSKTLEEIDIDIASYNLKPCCANVVREIMDLTAFDNAVLS
AQYSDIWKQERQFRITGTRCYSVYTFAKDNWSTMTRNFFWPKPFTSRYTDHGIKYEKEAL
IKYTRSNNYKVVELGLVICKQLPWIAYSPDGVVMADGAPTRLVEIKCPYDGILPADNLKV
LT*
>CALMACT00000006724|CALMACG00000004219|Seq_id=1904_quiver|type=cds
MSDKSVFLTLSSVFKYLCANTDSRCITEGEEILNANHIILMGVVSDSEEYVELSSIASLE
ILSSTDVKCYWSYKKKAVEEQ*
>CALMACT00000029885|CALMACG00000018110|Seq_id=1904_quiver|type=cds
MGKPVTASMCVCSKHFRKEDYCAKDIQMNRPKLKRFAVPSLNLPKRKVYLEEHRHSSLGR
EERYAKRQKVQELQDANIVDAPLCDGSEEAVQLTVVPTPYGDLTEEEKVAVQNLLLLSSR
VAKGLVDKEVQVSSGDIIITFSSTIKEDRHLNSLTGIPSFTLLNKLALLVKKNYPDIKKH
KLSIIDRIVLVFMKLKMGLKFNVLSFLFKICSASCKIIFVEYVGKLANILKSCIVWPSYE
ECQQNIPSCFIDFKSVRTVLDCIEIPIEKPKCLCCRIRTYSHYKGRQTLKIMTGVSPAGL
ITFLSKSYGGRTSDKAIFEQSHLINKLQSRQDSLMVDKGFLIDKICKDHFIKLVRPHFLS
KKKQFSAEESKENISISRARVHIERVNQRMRIFNILNNPLPNALIPCIDNILVIICGMVN
LQTPILSAGKF*
what did you try ? what did you find on this forum ?
very useful site for me
seqkit is great for fasta files. In particular the
seqkit grep
subcommand can be used to search either fasta headers or sequences.