I have a fasta file like below contains the longest and the shortest isoforms of genes. I would like to know is there any way to compare those isoforms that related to a specific gene in this file (e.g. compare the longest isoform of VPS54 gene against the shortest one). or I have to save each isoform of a gene in a separate file, then compare them?
>ENSG00000143952|VPS54|ENST00000409558|2|63892146|64019072
MASSHSSSPVPQGSSSDVFFKIEVDPSKHIRPVPSLPDVCPKEPTVTDQHRWTVYHSKVN
LPAALNDPRLAKRESDFFTKTWGLDFVDTEVIPSFYLPQISKEHFTVYQQEISQREKIHE
RCKNICPPKDTFERTLLHTHDKSRTDLEQVPKIFMKPDFALDDSLTFNSVLPWSHFNTAG
GKGNRDAASSKLLQEKLSHYLDIVEVNIAHQISLRSEAFFHAMTSQHELQDYLRKTSQAV
KMLRDKIAQIDKVMCEGSLHILRLALTRNNCVKVYNKLKLMATVHQTQPTVQVLLSTSEF
VGALDLIATTQEVLQQELQGIHSFRHLGSQLCELEKLIDKMMIAEFSTYSHSDLNRPLED
DCQVLEEERLISLVFGLLKQRKLNFLEIYGEKMVITAKNIIKQCVINKVSQTEEIDTDVV
VKLADQMRMLNFPQWFDLLKDIFSKFTIFLQRVKATLNIIHSVVLSVLDKNQRTRELEEI
SQQKNAAKDNSLDTEVAYLIHEGMFISDAFGEGELTPIAVDTTSQRNASPNSEPCSSDSV
SEPECTTDSSSSKEHTSSSAIPGGVDIMVSEDMKLTDSELGKLANNIQELLYSASDICHD
RAVKFLMSRAKDGFLEKLNSMEFITLSRLMETFILDTEQICGRKSTSLLGALQSQAIKFV
NRFHEERKTKLSLLLDNERWKQADVPAEFQDLVDSLSDGKIALPEKKSGATEERKPAEVL
IVEGQQYAVVGTVLLLIRIILEYCQCVDNIPSVTTDMLTRLSDLLKYFNSRSCQLVLGAG
ALQVVGLKTITTKNLALSSRCLQLIVHYIPVIRAHFEARLPPKQYSMLRHFDHITKDYHD
HIAEISAKLVAIMDSLFDKLLSKYEVKAPVPSACFRNICKQMTKMHEAIFDLLPEEQTQM
LFLRINASYKLHLKKQLSHLNVINDGGPQNGLVTADVAFYTGNLQALKGLKDLDLNMAEI
WEQKR*
>ENSG00000143952|VPS54|ENST00000272322|2|63892150|64019428
MASSHSSSPVPQGSSSDVFFKIEVDPSKHIRPVPSLPDVCPKEPTGDSHSLYVAPSLVTD
QHRWTVYHSKVNLPAALNDPRLAKRESDFFTKTWGLDFVDTEVIPSFYLPQISKEHFTVY
QQEISQREKIHERCKNICPPKDTFERTLLHTHDKSRTDLEQVPKIFMKPDFALDDSLTFN
SVLPWSHFNTAGGKGNRDAASSKLLQEKLSHYLDIVEVNIAHQISLRSEAFFHAMTSQHE
LQDYLRKTSQAVKMLRDKIAQIDKVMCEGSLHILRLALTRNNCVKVYNKLKLMATVHQTQ
PTVQVLLSTSEFVGALDLIATTQEVLQQELQGIHSFRHLGSQLCELEKLIDKMMIAEFST
YSHSDLNRPLEDDCQVLEEERLISLVFGLLKQRKLNFLEIYGEKMVITAKNIIKQCVINK
VSQTEEIDTDVVVKLADQMRMLNFPQWFDLLKDIFSKFTIFLQRVKATLNIIHSVVLSVL
DKNQRTRELEEISQQKNAAKDNSLDTEVAYLIHEGMFISDAFGEGELTPIAVDTTSQRNA
SPNSEPCSSDSVSEPECTTDSSSSKEHTSSSAIPGGVDIMVSEDMKLTDSELGKLANNIQ
ELLYSASDICHDRAVKFLMSRAKDGFLEKLNSMEFITLSRLMETFILDTEQICGRKSTSL
LGALQSQAIKFVNRFHEERKTKLSLLLDNERWKQADVPAEFQDLVDSLSDGKIALPEKKS
GATEERKPAEVLIVEGQQYAVVGTVLLLIRIILEYCQCVDNIPSVTTDMLTRLSDLLKYF
NSRSCQLVLGAGALQVVGLKTITTKNLALSSRCLQLIVHYIPVIRAHFEARLPPKQYSML
RHFDHITKDYHDHIAEISAKLVAIMDSLFDKLLSKYEVKAPVPSACFRNICKQMTKMHEA
IFDLLPEEQTQMLFLRINASYKLHLKKQLSHLNVINDGGPQNGLVTADVAFYTGNLQALK
GLKDLDLNMAEIWEQKR*
>ENSG00000014641|MDH1|ENST00000539945|2|63589151|63607194
MRRCSYFPKDVTVFDKDDKSEPIRVLVTGAAGQIAYSLLYSIGNGSVFGKDQPIILVLLD
ITPMMGVLDGVLMELQDCALPLLKDVIATDKEDVAFKDLDVAILVGSMPRREGMERKDLL
KANVKIFKSQGAALDKYAKKSVKVIVVGNPANTNCLTASKSAPSIPKENFSCLTRLDHNR
AKAQIALKLGVTANDVKNVIIWGNHSSTQYPDVNHAKVKLQGKEVGVYEALKDDSWLKGE
FVTTVQQRGAAVIKARKLSSAMSAAKAICDHVRDIWFGTPEGEFVSMGVISDGNSYGVPD
DLLYSFPVVIKNKTWKFVEGLPINDFSREKMDLTAKELTEEKESAFEFLSSA*
>ENSG00000014641|MDH1|ENST00000233114|2|63588963|63607197
MSEPIRVLVTGAAGQIAYSLLYSIGNGSVFGKDQPIILVLLDITPMMGVLDGVLMELQDC
ALPLLKDVIATDKEDVAFKDLDVAILVGSMPRREGMERKDLLKANVKIFKSQGAALDKYA
KKSVKVIVVGNPANTNCLTASKSAPSIPKENFSCLTRLDHNRAKAQIALKLGVTANDVKN
VIIWGNHSSTQYPDVNHAKVKLQGKEVGVYEALKDDSWLKGEFVTTVQQRGAAVIKARKL
SSAMSAAKAICDHVRDIWFGTPEGEFVSMGVISDGNSYGVPDDLLYSFPVVIKNKTWKFV
EGLPINDFSREKMDLTAKELTEEKESAFEFLSSA*
You could simply
blat
the same file against itself. It will give you all possible combinations automgically.Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.