I have two fasta file with 200 sequences. I want to use shell commands to find identical sequences with different headers between theses two fasta files and save that in new file with heder as output.
forexample:
file1:
>NP_000009.1 very long-chain specific acyl-CoA dehydrogenase, mitochondrial isoform 1 precursor [Homo sapiens]
MQAARMAASLGRQLLRLGGGSSRLTALLGQPRPGPARRPYAGGAAQLALDKSDSHPSDALTRKKPAKAES
KSFAVGMFKGQLTTDQVFPYPSVLNEEQTQFLKELVEPVSRFFEEVNDPAKNDALEMVEETTWQGLKELG
AFGLQVPSELGGVGLCNTQYARLVEIVGMHDLGVGITLGAHQSIGFKGILLFGTKAQKEKYLPKLASGET
VAAFCLTEPSSGSDAASIRTSAVPSPCGKYYTLNGSKLWISNGGLADIFTVFAKTPVTDPATGAVKEKIT
AFVVERGFGGITHGPPEKKMGIKASNTAEVFFDGVRVPSENVLGEVGSGFKVAMHILNNGRFGMAAALAG
TMRGIIAKAVDHATNRTQFGEKIHNFGLIQEKLARMVMLQYVTESMAYMVSANMDQGATDFQIEAAISKI
FGSEAAWKVTDECIQIMGGMGFMKEPGVERVLRDLRIFRIFEGTNDILRLFVALQGCMDKGKELSGLGSA
LKNPFGNAGLLLGEAGKQLRRRAGLGSGLSLSGLVHPELSRSGELAVRALEQFATVVEAKLIKHKKGIVN
EQFLLQRLADGAIDLYAMVVVLSRASRSLSEGHPTAQHEKMLCDTWCIEAAARIREGMAALQSDPWQQEL
YRNFKSISKALVERGGVVTSNPLGF
>NP_000010.1 acetyl-CoA acetyltransferase, mitochondrial precursor [Homo sapiens]
MAVLAALLRSGARSRSPLLRRLVQEIRYVERSYVSKPTLKEVVIVSATRTPIGSFLGSLSLLPATKLGSI
AIQGAIEKAGIPKEEVKEAYMGNVLQGGEGQAPTRQAVLGAGLPISTPCTTINKVCASGMKAIMMASQSL
MCGHQDVMVAGGMESMSNVPYVMNRGSTPYGGVKLEDLIVKDGLTDVYNKIHMGSCAENTAKKLNIARNE
QDAYAINSYTRSKAAWEAGKFGNEVIPVTVTVKGQPDVVVKEDEEYKRVDFSKVPKLKTVFQKENGTVTA
ANASTLNDGAAALVLMTADAAKRLNVTPLARIVAFADAAVEPIDFPIAPVYAASMVLKDVGLKKEDIAMW
EVNEAFSLVVLANIKMLEIDPQKVNINGGAVSLGHPIGMSGARIVGHLTHALKQGEYGLASICNGGGGAS
AMLIQKL
file2:
>sp|Q8R519|ACMSD_MOUSE 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase OS=Mus musculus GN=Acmsd PE=1 SV=2
MKIDIHTHILPKEWPDLEKRFGYGGWVQLQQQGKGEAKMIKDGKLFRVIQQNCWDPEVRI
REMNQKGVTVQALSTVPVMFSYWAKPKDTLELCQFLNNDLAATVARYPRRFVGLGTLPMQ
APELAVEEMERCVKALGFPGIQIGSHINTWDLNDPELFPIYAAAERLNCSLFVHPWDMQM
DGRMAKYWLPWLVGMPSETTMAICSMIMGGVFEKFPKLKVCFAHGGGAFPFTIGRIAHGF
NMRPDLCAQDNPSDPRKYLGSFYTDSLVHDPLSLKLLTDVIGKDKVMLGTDYPFPLGEQE
PGKLIESMAEFDEETKDKLTAGNALAFLGLERKLFE
>sp|P35738|ODBB_RAT 2-oxoisovalerate dehydrogenase subunit beta, mitochondrial OS=Rattus norvegicus GN=Bckdhb PE=1 SV=3
MQAARMAASLGRQLLRLGGGSSRLTALLGQPRPGPARRPYAGGAAQLALDKSDSHPSDALTRKKPAKAES
KSFAVGMFKGQLTTDQVFPYPSVLNEEQTQFLKELVEPVSRFFEEVNDPAKNDALEMVEETTWQGLKELG
AFGLQVPSELGGVGLCNTQYARLVEIVGMHDLGVGITLGAHQSIGFKGILLFGTKAQKEKYLPKLASGET
VAAFCLTEPSSGSDAASIRTSAVPSPCGKYYTLNGSKLWISNGGLADIFTVFAKTPVTDPATGAVKEKIT
AFVVERGFGGITHGPPEKKMGIKASNTAEVFFDGVRVPSENVLGEVGSGFKVAMHILNNGRFGMAAALAG
TMRGIIAKAVDHATNRTQFGEKIHNFGLIQEKLARMVMLQYVTESMAYMVSANMDQGATDFQIEAAISKI
FGSEAAWKVTDECIQIMGGMGFMKEPGVERVLRDLRIFRIFEGTNDILRLFVALQGCMDKGKELSGLGSA
LKNPFGNAGLLLGEAGKQLRRRAGLGSGLSLSGLVHPELSRSGELAVRALEQFATVVEAKLIKHKKGIVN
EQFLLQRLADGAIDLYAMVVVLSRASRSLSEGHPTAQHEKMLCDTWCIEAAARIREGMAALQSDPWQQEL
YRNFKSISKALVERGGVVTSNPLGF
>sp|P26149|3BHS2_MOUSE 3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase type 2 OS=Mus musculus GN=Hsd3b2 PE=1 SV=4
MPGWSCLVTGAGGFLGQRIIQLLVQEEDLEEIRVLDKVFRPETRKEFFNLETSIKVTVLE
GDILDTQYLRRACQGISVVIHTAAIIDVTGVIPRQTILDVNLKGTQNLLEACIQASVPAF
IFSSSVDVAGPNSYKEIVLNGHEEECHESTWSDPYPYSKKMAEKAVLAANGSMLKNGGTL
QTCALRPMCIYGERSPLISNIIIMALKHKGILRSFGKFNTANPVYVGNVAWAHILAARGL
RDPKKSPNIQGEFYYISDDTPHQSFDDISYTLSKEWGFCLDSSWSLPVPLLYWLAFLLET
VSFLLSPIYRYIPPFNRHLVTLSGSTFTFSYKKAQRDLGYEPLVSWEEAKQKTSEWIGTL
VEQHRETLDTKSQ
new file:
>NP_000009.1 very long-chain specific acyl-CoA dehydrogenase, mitochondrial isoform 1 precursor [Homo sapiens]
MQAARMAASLGRQLLRLGGGSSRLTALLGQPRPGPARRPYAGGAAQLALDKSDSHPSDALTRKKPAKAES
KSFAVGMFKGQLTTDQVFPYPSVLNEEQTQFLKELVEPVSRFFEEVNDPAKNDALEMVEETTWQGLKELG
AFGLQVPSELGGVGLCNTQYARLVEIVGMHDLGVGITLGAHQSIGFKGILLFGTKAQKEKYLPKLASGET
VAAFCLTEPSSGSDAASIRTSAVPSPCGKYYTLNGSKLWISNGGLADIFTVFAKTPVTDPATGAVKEKIT
AFVVERGFGGITHGPPEKKMGIKASNTAEVFFDGVRVPSENVLGEVGSGFKVAMHILNNGRFGMAAALAG
TMRGIIAKAVDHATNRTQFGEKIHNFGLIQEKLARMVMLQYVTESMAYMVSANMDQGATDFQIEAAISKI
FGSEAAWKVTDECIQIMGGMGFMKEPGVERVLRDLRIFRIFEGTNDILRLFVALQGCMDKGKELSGLGSA
LKNPFGNAGLLLGEAGKQLRRRAGLGSGLSLSGLVHPELSRSGELAVRALEQFATVVEAKLIKHKKGIVN
EQFLLQRLADGAIDLYAMVVVLSRASRSLSEGHPTAQHEKMLCDTWCIEAAARIREGMAALQSDPWQQEL
YRNFKSISKALVERGGVVTSNPLGF
Jason : Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.