Question

How to align reads on reference using python?

1

Entering edit mode

6.1 years ago

doramora ▴ 10

It may sound quiet easy (for you), so i want to find a bit of help there. How to align reads on reference? I've got a file with reference and another file with lots of reads, how can i align it and output as SAM file in Python? (i know how to do it in bowtie2, but i can't imagine, how to do it in python without subprocess)

(if you can't understand the question - also comment, and i'll try to rewrite some moments that are not clear enough)

genome sequence gene sequencing • 7.8k views

ADD COMMENT • link updated 6.1 years ago by WouterDeCoster 47k • written 6.1 years ago by doramora ▴ 10

2

Entering edit mode

, but i can't imagine, how to do it in python without subprocess)

why would you like to re-invent the wheel ?

as far as I know, algorithms like the one used by bwa or bowtie haven't been implemented in python. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2828108/

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

and why do we have biopython itself? it could help only with parsing and other not really big things? (this publication is more than 8 years old. I read it and thought that something could already been changed)

ADD REPLY • link 6.1 years ago by doramora ▴ 10

2

Entering edit mode

BioPython is designed to be 'glue'. To help convert file formats, provide an easy to use, unified front end for many common bioinformatics tasks. At its core it's just a clever way of storing/representing sequence data. It isn't designed to do heavy lifting like read mapping.

Even for alignments for example, for anything over about 2 sequences, BioPython calls to other programs (likely through subprocess but I haven't checked) such as BLAST, T-Coffee, Clustal etc.

ADD REPLY • link 6.1 years ago by Joe 21k

1

Entering edit mode

You did not tell us anything about your reads: their length and which technology, DNA-sequencing or RNA-sequencing. Your question confuses me a bit, I wonder why you would use subprocess in python if you can just use the command line to execute bowtie. Using subprocess adds an unnecessary layer of complexity. Perhaps you can explain again why you would like to use subprocess for this.

Anyway, if the minimap2 aligner is good for your application then you can use it in python, see this page. But that is just a python api/interface to the minimap2 aligner, which itself is not written in python.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

1

Entering edit mode

Why not use subprocess? This is exactly what it's for. Though it adds a layer of complexity over just using a shell script.

You wouldn't want to write an aligner in python anyway, it would be painfully slow and inefficient, though like any good programming language, you can in principle code the algorithm in pure python.

ADD REPLY • link 6.1 years ago by Joe 21k

0

Entering edit mode

I think you have a general misunderstanding of what Python and bowtie2 are. Python is a language, not a tool. Bowtie2 is a tool, written in mostly C++, but not a language itself. Therefore, I do not quiet understand the purpose of your question. Could you elaborate?

ADD REPLY • link 6.1 years ago by ATpoint 86k

0

Entering edit mode

ok now i'm on the one of the first steps of learning bioinformatics, and my teacher gave me so called 'exercises', to understand, how it works. i've made a 'program' in Linux Terminal with help of bowtie2-build, bowtie2, samtools and bedtools it was like smth of this kind:

bowtie2-build /path to reference/ index 
bowtie2 -x index -U /path to reads/ --very-sensitive-local -S output.sam
samtools view -bS output.sam > output.bam 
samtools view -b -F 4 output.bam >f_output.bam 
bedtools bamtofastq -i f_output.bam -fq f_output.fastq

and after this i got a task to do the same thing but in python. maybe I just fundamentally don't understand something?

ADD REPLY • link updated 6.1 years ago by WouterDeCoster 47k • written 6.1 years ago by doramora ▴ 10

1

Entering edit mode

Based on this it seems you are expected to write a workflow in python. Yes, you could do this using subprocess.

Pro tip: use the shlex module when you are using subprocess

If you want to do bowtie2 -x index -U /path to reads/ --very-sensitive-local -S output.sam then you could do:

subprocess.call(shlex.split("bowtie2 -x index -U /path to reads/ --very-sensitive-local -S output.sam"))

For completeness sake I have to tell you that there are far more advanced workflow systems in python, for example snakemake, see also this tutorial.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

1

Entering edit mode

Ok, so basically executing the pipeline from within Python instead of the Unix shell. Wouter's comment catches that pretty well I think.

ADD REPLY • link 6.1 years ago by ATpoint 86k

0

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

0

Entering edit mode

after this i got a task to do the same thing but in python. maybe I just fundamentally don't understand something?

Then short of writing your own aligner in python you will have to do this with subprocess as you said in original post.

ADD REPLY • link 6.1 years ago by GenoMax 148k

0

Entering edit mode

you can use os library to execute os tools (os.system). but it is not recommended. You can also use python frameworks as ruffus or snakemake elizaveta.karpovich

ADD REPLY • link 6.1 years ago by cpad0112 21k