I have multifasta protein sequences with long headings, but i want to exclude everything and keep only gene name which appears after 'GN= '. Can anyone help with this pls....
sp|P0AGM2|YICG_ECOLI UPF0126 inner membrane protein YicG OS=Escherichia coli (strain K12) OX=83333 GN=yicG PE=1 SV=1
MLLHILYLVGITAEAMTGALAAGRRRMDTFGVIIIATATAIGGGSVRDILLGHYPLGWVK HPEYVIIVATAAVLTTIVAPVMPYLRKVFLVLDALGLVVFSIIGAQVALDMGHGPIIAVV AAVTTGVFGGVLRDMFCKRIPLVFQKELYAGVSFASAVLYIALQHYVSNHDVVIISTLVF GFFARLLALRLKLGLPVFYYSHEGH
sp|P64442|YCEO_ECOLI Uncharacterized protein YceO OS=Escherichia coli (strain K12) OX=83333 GN=yceO PE=1 SV=1
MRPFLQEYLMRRLLHYLINNIREHLMLYLFLWGLLAIMDLIYVFYF
I want output like this (with > symbol)
yicG
MLLHILYLVGITAEAMTGALAAGRRRMDTFGVIIIATATAIGGGSVRDILLGHYPLGWVK HPEYVIIVATAAVLTTIVAPVMPYLRKVFLVLDALGLVVFSIIGAQVALDMGHGPIIAVV AAVTTGVFGGVLRDMFCKRIPLVFQKELYAGVSFASAVLYIALQHYVSNHDVVIISTLVF GFFARLLALRLKLGLPVFYYSHEGH
yceO
MRPFLQEYLMRRLLHYLINNIREHLMLYLFLWGLLAIMDLIYVFYF
Note : Sorry > symbol was there in all the fasta headers but its not appearing in biostars ( May be i don't know how to post)
Thanks in advance
you can also try this: