Hi all,
I have a FASTA file which contains protein sequences of a load of genes from D. melanogaster, and need to split the file into multiple FASTAs, one gene per file. What's the best way to go about this? Ideally each file will be named with the name of the gene (UniProt ID).
Example, the sequence below wants to be split into 2 files, the first called O46197.fasta, the second called Q9VUQ5.fasta
Many thanks!
>sp|O46197|A29AB_DROME Accessory gland protein Acp29AB OS=Drosophila melanogaster GN=Acp29AB PE=2 SV=2
MYASNLLYLLALWNLWDLSGGQQDIPNGKATLPSPQTPQNTIDQIGINQNYWFTYNALKQ
NETLAIIDTMEMRIASSLLEFKAQMEIQLQPLKIIMRHHASNIKASNNIKMRRFEKVGSR
HFHIEKNLMQTWFEAYVTCRKMNGHLANIQDEMELDGILALAPNNSYWIDISKLVENGGT
FVSTLTGREPFFVKWKSNQDTKKKNQCVYIYAKEMSYDECFEKKSFVCQADQWA
.
>sp|Q9VUQ5|AGO2_DROME Protein argonaute-2 OS=Drosophila melanogaster GN=AGO2 PE=1 SV=3
MGKKDKNKKGGQDSAAAPQPQQQQKQQQQRQQQPQQLQQPQQLQQPQQLQQPQQQQQQQP
HQQQQQSSRQQPSTSSGGSRASGFQQGGQQQKSQDAEGWTAQKKQGKQQVQGWTKQGQQG
GHQQGRQGQDGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQ
QRPSGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQG
RQGQEGGYQQRPPGQQQGGHQQGRQGQEGGYQQRPPGQQQGGHEQGRQGQEGGYQQRPSG
QQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQEGGYQQRPSGQQQGGHQQGRQGQE
GGYQQRPPGQQPNQTQSQGQYQSRGPPQQQQAAPLPLPPQPAGSIKRGTIGKPGQVGINY
LDLDLSKMPSVAYHYDVKIMPERPKKFYRQAFEQFRVDQLGGAVLAYDGKASCYSVDKLP
LNSQNPEVTVTDRNGRTLRYTIEIKETGDSTIDLKSLTTYMNDRIFDKPMRAMQCVEVVL
ASPCHNKAIRVGRSFFKMSDPNNRHELDDGYEALVGLYQAFMLGDRPFLNVDISHKSFPI
SMPMIEYLERFSLKAKINNTTNLDYSRRFLEPFLRGINVVYTPPQSFQSAPRVYRVNGLS
RAPASSETFEHDGKKVTIASYFHSRNYPLKFPQLHCLNVGSSIKSILLPIELCSIEEGQA
LNRKDGATQVANMIKYAATSTNVRKRKIMNLLQYFQHNLDPTISRFGIRIANDFIVVSTR
VLSPPQVEYHSKRFTMVKNGSWRMDGMKFLEPKPKAHKCAVLYCDPRSGRKMNYTQLNDF
GNLIISQGKAVNISLDSDVTYRPFTDDERSLDTIFADLKRSQHDLAIVIIPQFRISYDTI
KQKAELQHGILTQCIKQFTVERKCNNQTIGNILLKINSKLNGINHKIKDDPRLPMMKNTM
YIGADVTHPSPDQREIPSVVGVAASHDPYGASYNMQYRLQRGALEEIEDMFSITLEHLRV
YKEYRNAYPDHIIYYRDGVSDGQFPKIKNEELRCIKQACDKVGCKPKICCVIVVKRHHTR
FFPSGDVTTSNKFNNVDPGTVVDRTIVHPNEMQFFMVSHQAIQGTAKPTRYNVIENTGNL
DIDLLQQLTYNLCHMFPRCNRSVSYPAPAYLAHLVAARGRVYLTGTNRFLDLKKEYAKRT
IVPEFMKKNPMYFV
cross posted on http://seqanswers.com/forums/showthread.php?p=110623
Duplicate of many previous questions; search this site for "split fasta".