Hi, I have a fasta file and I want to only extract the sequences and the part of the header before and after the '|' (eg P00533 EGFR_HUMAN Epidermal growth factor receptor) if OS=Homo Sapiens. Is this possible quth a quick awk one liner??
>sp|P00533|EGFR_HUMAN Epidermal growth factor receptor OS=Homo sapiens GN=EGFR PE=1 SV=2
MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEV
VLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALA
VLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDF
QNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGC
TGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYV
VTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFK
NCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAF
>sp|P31749|AKT1_HUMAN RAC-alpha serine/threonine-protein kinase OS=Homo sapiens GN=AKT1 PE=1
MSDVAIVKEGWLHKRGEYIKTWRPRYFLLKNDGTFIGYKERPQDVDQREAPLNNFSVAQC
QLMKTERPRPNTFIIRCLQWTTVIERTFHVETPEEREEWTTAIQTVADGLKKQEEEEMDF
RSGSPSDNSGAEEMEVSLAKPKHRVTMNEFEYLKLLGKGTFGKVILVKEKATGRYYAMKI
>tr|P91634|P91634_DROME PI-3 kinase OS=Drosophila melanogaster GN=Pi3K92E PE=1 SV=1
MNMMDNRALAYVAHQPKYETPPEEAEPPCMRFSVNLWKNEMLNWVDLICLLPNGFLLELR
VNPANTIQVIKVEMVNQAKQMPLGYVIKEACEYQVYGISTFNIEPYTDETKRLSEVQPYF
GILSLGERTDTTSFSSDYELTKMVNGMIGTTFDHNRTHGSPEIDDFRLYMTQTCDNIELE
Thank you very much Kevin. I'm new to bash, I understand from your code how you separate the line and only print either side of the '|' with the 'split(a[1], b, "|"); print ">"b[2]" "b[3]}'. Can you explain what the bprint is doing??
Hey Zoe, could you repost this as a comment to my answer? Just to maintain fluidity of the thread. You can delete this and then re-post above. I have answered your comment already there!
Awesome, thank you Kevin!
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.