I am working with text file containing extracted sequences as per required from another file. My resulting text file is of format:
Zebrafish ESLLRFGLRSDLDFRLSLNGKEDLLDTGQSLSSCGVVSGDLISVILPASSQTSSAAHQTHTDQQSSQECVDLQQDCMDQQQQQEQECVCAAAPPLLCCEAEDGLLPLALERLLDSSTCRSPSDCLMLALHLLLLETGFIPQGGAVSSGEMPIGWQAAGVFRLQYVHPLLENSLVSVVAVPMGQTLVINAVLKMETSLENSRKLLLKPDEYVTAWTGGSSGVVYRDLRRLSRLVRDQLVYPLMATARQALGLPLLFGLPVLPPELLLRLLRLLDVRSLVSLSAVCRHLNTATHDASLWRHLLHRDFRVSFPAHRDTDWRELYKQKYRQRAARRGRHWFYPPPISPLIPFPSSPALYPPGIIGDYDQMPILPRPRFHPIGPLPGMSAPV
Fugu ETVLSVGLSAETEISLSLNGSEPLEDTGQTLASCGIVSGDLIRVALIRAADAPDRDDGGGHSEQVSQEAKLPDASGASTDSDQAPGPAASCWEPMLCSETDEGQAPWSLELLYHSAQVSGPGDALVVAANLLMIETGFSPQDSQLKPAEMPAGWRCGGVYKLQYSHRLCGDSVVVMVAVSMGSALIINGLLEVNQSADSVCKLCVDPSSYVTEWPGDSAAAAFKELNKLSRVFKDQVAYPLITAARHAMALPVAFGLTALPPELLLRVFRLLDVRSVVMLSAVCRHFGAITRDTALWRHLYCRDFRDSHAGSRDTDWKEVYRRSYKSRSAVRRSHECFLPPLYPNPRGVFTPPPPVPGIIGEYDQRPILPRPRYDPMSPFPDLDRQP
Chicken RALLAWGYSSDTEFSITLNGKDALTEDEKTLASYGIVPGDLICLLLEETDLPPPSSSPPSLQNGKNGSSLEFPSGLVPEDVDLEEGTGSYPSEPMLCSEAADGEIPHSLEVLYLSAECTSATDALIVLVHLLMMETGYVPQGTEAKAVSMPEKWRGNGVYKLQYTHPLCEEGSAGLTCVPLGDLVAINATLKINREIKGVKRIQLLPASFVCFQEPEKVAGVYKDLQKLSRLFKDQLVYSLLAAARQALNLPDVFGLVVLPLELKLRIFRLLDVRSLISLSAVCRDLYAASNDQLLWRFMYLRDFRDPIARPRDTDWKELYKKKLKQKEALRWRHMFLPPPFHPNPFYPSPFPIYPPMVIGEYGERPSLIPPHFDPIGSLPGANPTL
Zebra SMTENRTAGSDTAFSVTLNRKDALTEDQKTLASYGIVSGDLICLLLEEPDLPPPPATPAPLQNGNNGSSLEFPSGLVPEDADLEEGTGSYPSEPMLCSEAADGETPHSLEMLYLSAECTSATDALIVLVHLLMMETGYVPQGIEAKAVFMPEKWRGNGVYKLQYTHPLCGEGCAGLTCVPLGDLIAINATLKINEEIRSVKRIQLLPSSFVCFQDPEKVAGVYKDLQKLSRLFKDQLVYSLLAAARQALNLPDVFGLLVLPLELKLRIFRLLDVRSLISLSAVCRDLYTASNDQLLWRFMYLRDFRDPIARPRDTDWKELYKKKLKQKEALRWRHMMLLPPFHPNPFYPNPFPIYPPMIIGEYDERPSLIPPHFDPIGSLPGANPML
Anole QALLSWGYSSETKFEITLNNKDSLVGDQDTLASFGIVSGDLICLILEDDASSPSSSLPSSQSNHHSGPSQEFTSEGGPDDLDLQEATGSFPSEPMLCCEATDGQVPHSLQTLYHSAECTNANDALIVSIHLIMMETGYVPQGTEAKASSMPENWRNKGVYKLLYTHPLCENGFAVLTCVPLGNLIVVNAMLKITSDIKSVKRLQLLPTSFICFQDSANVVGVYKDLQKLSRLFKDRLVYPLLAAARQALNLPDVFGLVVLPLELKLRIFRLLDFRSLLSLSAVCHDLYAASNDQLLWRFIYLRDFRDPVARSRDTDWKELYKKKMKQKDALRWRHMMFLPPLHPNPLYPNPFPLYPPMIIGEMDERPSLFPSHLDPFGSFQNPNPTL
Human QSLLTWGYSSNTRFTITLNYKDPLTGDEETLASYGIVSGDLICLILQDDIIPSSTSEHSSLQNNSNGPSQNFEAESIQDNAHMAEGTGFYPSEPMLCSESVEGQVPHSLETLYQSADCSDANDALIVLIHLLMLESGYIPQGTEAKALSMPEKWKLSGVYKLQYMHPLCEGSSATLTCVPLGNLIVVNATLKINNEIRSVKRLQLLPESFICKKLGENVANIYKDLQKLSRLFKDQLVYPLLAFTRQALNLPDVFGLVVLPLELKLRIFRLLDVRSVLSLSAVCRDLFTASNDPLLWRFLYLRDFRDNTVRVQDTDWKELYRKRHIQRESPKGRFVMLLPPFYPNPLHPRPFPRLPPGIIGEYDQRPSLIPPRFDPVGPLPGPNPIL
I need to convert the sequences in this file to phylip format (using python codes), as this:
14 387
Zebrafish ESLLRFGLRS DLDFRLSLNG KEDLLDTGQS LSSCGVVSGD LISVILPASS
Fugu ETVLSVGLSA ETEISLSLNG SEPLEDTGQT LASCGIVSGD LIRVALIRAA
Chicken RALLAWGYSS DTEFSITLNG KDALTEDEKT LASYGIVPGD LICLLLEETD
Zebra SMTENRTAGS DTAFSVTLNR KDALTEDQKT LASYGIVSGD LICLLLEEPD
Anole QALLSWGYSS ETKFEITLNN KDSLVGDQDT LASFGIVSGD LICLILEDDA
Human QSLLTWGYSS NTRFTITLNY KDPLTGDEET LASYGIVSGD LICLILQDDI
QTSSAAHQTH TDQQSSQECV DLQQDCMDQQ QQQEQECVCA AAPPLLCCEA
DAPDRDDGGG HSEQVSQEAK LPDASGASTD SDQAPGPAAS CWEPMLCSET
LPPPSSSPPS LQNGKNGSSL EFPSGLVPED VDLEEGTGSY PSEPMLCSEA
LPPPPATPAP LQNGNNGSSL EFPSGLVPED ADLEEGTGSY PSEPMLCSEA
SSPSSSLPSS QSNHHSGPSQ EFTSEGGPDD LDLQEATGSF PSEPMLCCEA
IPSSTSEHSS LQNNSNGPSQ NFEAESIQDN AHMAEGTGFY PSEPMLCSES
EDGLLPLALE RLLDSSTCRS PSDCLMLALH LLLLETGFIP QGGAVSSGEM
DEGQAPWSLE LLYHSAQVSG PGDALVVAAN LLMIETGFSP QDSQLKPAEM
ADGEIPHSLE VLYLSAECTS ATDALIVLVH LLMMETGYVP QGTEAKAVSM
ADGETPHSLE MLYLSAECTS ATDALIVLVH LLMMETGYVP QGIEAKAVFM
TDGQVPHSLQ TLYHSAECTN ANDALIVSIH LIMMETGYVP QGTEAKASSM
VEGQVPHSLE TLYQSADCSD ANDALIVLIH LLMLESGYIP QGTEAKALSM
PIGWQAAGVF RLQYVHPLLE NSLVSVVAVP MGQTLVINAV LKMETSLENS
PAGWRCGGVY KLQYSHRLCG DSVVVMVAVS MGSALIINGL LEVNQSADSV
PEKWRGNGVY KLQYTHPLCE EGSAGLTCVP LGDLVAINAT LKINREIKGV
PEKWRGNGVY KLQYTHPLCG EGCAGLTCVP LGDLIAINAT LKINEEIRSV
PENWRNKGVY KLLYTHPLCE NGFAVLTCVP LGNLIVVNAM LKITSDIKSV
PEKWKLSGVY KLQYMHPLCE GSSATLTCVP LGNLIVVNAT LKINNEIRSV
RKLLLKPDEY VTAWTGGSSG VVYRDLRRLS RLVRDQLVYP LMATARQALG
CKLCVDPSSY VTEWPGDSAA AAFKELNKLS RVFKDQVAYP LITAARHAMA
KRIQLLPASF VCFQEPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
KRIQLLPSSF VCFQDPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
KRLQLLPTSF ICFQDSANVV GVYKDLQKLS RLFKDRLVYP LLAAARQALN
KRLQLLPESF ICKKLGENVA NIYKDLQKLS RLFKDQLVYP LLAFTRQALN
LPLLFGLPVL PPELLLRLLR LLDVRSLVSL SAVCRHLNTA THDASLWRHL
LPVAFGLTAL PPELLLRVFR LLDVRSVVML SAVCRHFGAI TRDTALWRHL
LPDVFGLVVL PLELKLRIFR LLDVRSLISL SAVCRDLYAA SNDQLLWRFM
LPDVFGLLVL PLELKLRIFR LLDVRSLISL SAVCRDLYTA SNDQLLWRFM
LPDVFGLVVL PLELKLRIFR LLDFRSLLSL SAVCHDLYAA SNDQLLWRFI
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLFTA SNDPLLWRFL
LHRDFRVSFP AHRDTDWREL YKQKYRQRAA RRGRHWFYPP PISPLIPFPS
YCRDFRDSHA GSRDTDWKEV YRRSYKSRSA VRRSHECFLP PLYPNPRGVF
YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMFLPP PFHPNPFYPS
YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMMLLP PFHPNPFYPN
YLRDFRDPVA RSRDTDWKEL YKKKMKQKDA LRWRHMMFLP PLHPNPLYPN
YLRDFRDNTV RVQDTDWKEL YRKRHIQRES PKGRFVMLLP PFYPNPLHPR
SPALYPPGII GDYDQMPILP RPRFHPIGPL PGMSAPV
TPPPPVPGII GEYDQRPILP RPRYDPMSPF PDLDRQP
PFPIYPPMVI GEYGERPSLI PPHFDPIGSL PGANPTL
PFPIYPPMII GEYDERPSLI PPHFDPIGSL PGANPML
PFPLYPPMII GEMDERPSLF PSHLDPFGSF QNPNPTL
PFPRLPPGII GEYDQRPSLI PPRFDPVGPL PGPNPIL
Can I get some guidelines please?
I just need to process the file in a way that the result be a phylip formatted file. I do not need the alignment.
Are you saying that first I have to convert the file to 'fasta' format and then I should convert it to phylip format? Note: I am using windows as OS and not linux.
I think that's the easiest way to do it personally. Trying to write a script which handles all the white spaces and sequence wrapping to go directly from your file to a Phylip strikes me as a nightmare. Phylip is a very strict file format too, meaning you'd have to get exactly the right amount of whitespace in all the right places.
On windows you will probably want to install Cygwin or the Linux Subsystem for Windows to do most of this - bioinformatics is just straight up harder on windows (also that information would probably have been useful with your first post).
If you wanted to, you could replace the
sed
steps with pure python fairly easily, if you already have python available (you will also need to install BioPython).Thankyou for the help.
I want to use python code for this purpose, for example, I was trying this:
but this converts my file to fasta format and not the phylip format. I just need to arrange my sequence in the way it looks as the phylip format and no alignment is required.
Can you help?
Yes, I already told you in my answer, use BioPython and convert the FASTA to a PHYLIP via the
AlignIO
module.I've updated my answer with some full code which will go directly from your input file to an output phylip using the approach I described.
Thanks so much for the response, but here is the error I get each time I try using this code:
What command are you using? It looks to me like you aren't passing the files in correctly. Please provide as much information as possible when you're posting else it makes more work for us to try to dig down to the root of the issue.
I am working with windows system and for python codes I am using python Idle 3.6.4, and when I run this script using the files I mentioned above, it displays an error (as I wrote earlier). The command I am using is the only one that is used to run the script and nothing else. I am not using linux, so I am writing the codes in a file.py and then running it to check the output. Since according to my knowledge, argv is the built-in array of linux and so I guess it does not work when I run the script in python IDLE (3.6.4).
Here's what I get using your data and that exact code (with the modification above for python3):
Command
Output:
Yes this is why I asked you to post the command please - copy and past the exact command you typed.
sys.argv[]
doesn't cause an error under python3. Something that will however is that inpython3
strings are already stored as unicode and do not need conversion, therefore the line:can be changed to remove the
unicode()
call:Command I was using:
This command is working but I do not want to use command, rather I want to take a user input and then result a phylip format file using the input text file. How do I do that?
For example, if I use this code as:
It outputs the following error message:
I'm not sure I fully understand; you want an interactive user menu? Personally, I find there is rarely a case where an interactive commandline is preferable to just a single command/script.
If you want to read the STDIN, you can still do that with the
sys
module, but you'll need to alter.argv[]
to something usingsys.stdin
.If you want to go down that path, you need to do some reading of your own as it will be much more complicated than the help this forum is set up to provide.
Here are some links that might get you started:
https://stackoverflow.com/questions/1450393/how-do-you-read-from-stdin-in-python
https://stackoverflow.com/questions/8878753/how-to-make-interactive-python-script-with-keyboard-arrows-navigation-in-menu
https://stackoverflow.com/questions/6218890/python-how-can-i-read-stdin-from-shell-and-send-stdout-to-shell-and-file
Alright then, these links will definitely be useful. I will follow these for sure. Thankyou so much for this guidance.
You cannot do:
That's not what square brackes are for.
You need to do this:
Command/input
Oh okay, I'll try this way. Thankyou for the help.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
how are you running it?
I ran it by giving the command:
This is the command which just worked now , at last and I converted the file to another format successfuly.