Entering edit mode
9.6 years ago
fashiondesignrussian
▴
60
How to run multiple alignment of WGS data in .gb and .fasta formats using Python or Ruby/Java? Please advise some packages and tutorials. I could not find a tutorial on multiple alignment and SNP calling using Python and Ruby. All I could do is to use trial DNAStar, RidomSeqSphere and NextGene software. Are there any free similar software and a way to do it with a modern language? Thank you, Folks.
Google for "biopython", specifically tutorials related to multiple alignments, which I recall it can make. SNP calling is normally a different matter, though you could parse the multiple alignment if you really wanted. There may not be a tutorial for it, so just figure it out.
I have BioPython and BioRuby, it is not enough, can you propose something more effective?
Are there any free analogues of the software I mentioned in my question?
The tools you mention are all GUI tools. You say you have BioPython and BioRuby but they are not enough (which is near impossible, seeing how they provide means to work with almost all bioinformatics cmd line tools). Quick question: How much programming experience do you have?
Yeah, if a GUI is needed then webtools should be used. There are web-based versions for many of the MSA tools.
Edit: Or there's Galaxy, which I presume also provides them.
But why use web-based tools in the first place when it's far more efficient and scalable to learn command line usage?
It's a question of how many times this needs to be done. If it's just a handful, then there's no point in bothering with any scripting or even the command line. If this needs to be done many times, then absolutely the command line or a specific script is needed.
Either of those should be sufficient. There are biopython tutorials on creating MSAs. Anything after that you might have to code a bit yourself (or not, it'll depend on what you want to do). Biopython itself is using freely available tools for all of this (biopython is just a convenient wrapper in this case).
Thanks, I know how to use Python and Ruby and functions from packages. I spent 7 years programming and learning computer sciences. I almost never ask questions on programming methodology and practice. For WGS with a 4,5 million nps it seems not to be the best option. Can you propose a better solution?
Ah, whole genome changes things completely. Most MSA programs are oriented toward proteins (that's what MSA what originally designed around). I'd be surprised if biopython provided any facilities for things of that magnitude. You'll likely need to write your own wrappers. See this thread for pointers on where to start: Help With Multiple Whole Genome Alignment. Aligning Over 400 Whole Genomes
I have Bowtie, RBowtie, Mummer, Mugsy, I can't say they are all good and easy enough to use for my goals. Seems that all free is complete bullshit, or a partial one. I need a free analogue of RidomSeqSphere and NexGene software. That was my question. I don't like to push nails with a violin and flute instead of a heavy hammer.
Bowtie etc. are short read aligners, you can't hope for them to produce whole genome MSAs. Please see the thread I linked to.