Entering edit mode
9.6 years ago
hb273
•
0
Hi all,
I have a FASTA file which contains protein domain sequences (Pfam-A.fasta) in different species, and need to split the file into multiple FASTAs, one domain in HUMAN per file. What's the best way to go about this? Ideally each file will be named with the name of the domain. I am using Perl.
Example of the sequence in fasta file:
>B4DW62_HUMAN/123-274 PF00198.17;2-oxoacid_dh;
WDGEGPKQLPFIDISVAVATDKGLLTPIIKDAAAKGIQEIADSVKALSKK
ARDGKLLPEEYQGGSFSISNLGMFGIDEFTAVINPPQACILAVGRFRPVL
KLTEDEEGNAKLQQRQLITVTMSSDSRVVDDELATRFLKSFKANLENPIR
LA
>D2HF00_AILME/272-501 PF00198.17;2-oxoacid_dh;
PGTFTEIPASNIRRVIAKRLTESKSTVPHAYATADCDLGAVLKARQSLVR
DDIKVSVNDFIIKAAAVTLKQMPDVNVSWDGEGPKQLPFIDISVAVATDK
GLITPIIKDAAAKGVQEIADSVKALSKKARDGKLLPEEYQGGSFSISNLG
MFGIDEFTAVINPPQACILAVGRFRPVLKLEQDEEGNARLQPHQLITVTM
SSDSRVVDDELATRFLENFKANLENPIRLA
Many Thanks
Hanadi
Can you post your perl code?
I am a new in Perl.
I used this script to split the file