Entering edit mode
6.0 years ago
doramora
▴
10
It may sound quiet easy (for you), so i want to find a bit of help there. How to align reads on reference? I've got a file with reference and another file with lots of reads, how can i align it and output as SAM file in Python? (i know how to do it in bowtie2, but i can't imagine, how to do it in python without subprocess)
(if you can't understand the question - also comment, and i'll try to rewrite some moments that are not clear enough)
why would you like to re-invent the wheel ?
as far as I know, algorithms like the one used by bwa or bowtie haven't been implemented in python. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2828108/
and why do we have biopython itself? it could help only with parsing and other not really big things? (this publication is more than 8 years old. I read it and thought that something could already been changed)
BioPython is designed to be 'glue'. To help convert file formats, provide an easy to use, unified front end for many common bioinformatics tasks. At its core it's just a clever way of storing/representing sequence data. It isn't designed to do heavy lifting like read mapping.
Even for alignments for example, for anything over about 2 sequences, BioPython calls to other programs (likely through subprocess but I haven't checked) such as BLAST, T-Coffee, Clustal etc.
You did not tell us anything about your reads: their length and which technology, DNA-sequencing or RNA-sequencing. Your question confuses me a bit, I wonder why you would use subprocess in python if you can just use the command line to execute bowtie. Using subprocess adds an unnecessary layer of complexity. Perhaps you can explain again why you would like to use subprocess for this.
Anyway, if the minimap2 aligner is good for your application then you can use it in python, see this page. But that is just a python api/interface to the minimap2 aligner, which itself is not written in python.
Why not use subprocess? This is exactly what it's for. Though it adds a layer of complexity over just using a shell script.
You wouldn't want to write an aligner in python anyway, it would be painfully slow and inefficient, though like any good programming language, you can in principle code the algorithm in pure python.
I think you have a general misunderstanding of what Python and bowtie2 are. Python is a language, not a tool. Bowtie2 is a tool, written in mostly C++, but not a language itself. Therefore, I do not quiet understand the purpose of your question. Could you elaborate?
ok now i'm on the one of the first steps of learning bioinformatics, and my teacher gave me so called 'exercises', to understand, how it works. i've made a 'program' in Linux Terminal with help of bowtie2-build, bowtie2, samtools and bedtools it was like smth of this kind:
and after this i got a task to do the same thing but in python. maybe I just fundamentally don't understand something?
Based on this it seems you are expected to write a workflow in python. Yes, you could do this using subprocess.
Pro tip: use the shlex module when you are using subprocess
If you want to do
bowtie2 -x index -U /path to reads/ --very-sensitive-local -S output.sam
then you could do:For completeness sake I have to tell you that there are far more advanced workflow systems in python, for example snakemake, see also this tutorial.
Ok, so basically executing the pipeline from within Python instead of the Unix shell. Wouter's comment catches that pretty well I think.
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Then short of writing your own aligner in python you will have to do this with subprocess as you said in original post.
you can use os library to execute os tools (os.system). but it is not recommended. You can also use python frameworks as ruffus or snakemake elizaveta.karpovich