Question

extract sequence from genome by region

0

Entering edit mode

11 months ago

gernophil ▴ 90

Hey, I am looking for a python based tool/script to download/extract the sequence of a genomic region. Up to now I use fasta from https://github.com/dancooke/bioio, but I'd like to integrate it into my Python script without the need to run/compile cpp code. Is there any tool available or does anyone know, how to translate this (https://github.com/dancooke/bioio/blob/master/fasta.cpp) to Python only?

fasta • 1.8k views

ADD COMMENT • link updated 11 months ago by Ram 44k • written 11 months ago by gernophil ▴ 90

0

Entering edit mode

If you have a fasta file and a bed file, you can use bedtools getfasta. You have pybedtools a python library for bedtools.

NOTE: As you are using Python, downloading the sequence would be very easy.

Cheers,

Nitin N.

ADD REPLY • link 11 months ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

I don't have a bed file unfortunately, but this should be easy to generate :). I'll look into it.

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

11 months ago

Sajad ▴ 90

you can run Bioio with Python by using subprocess module. subprocess is a built-in Python module. here is an example:

import subprocess

your_command = "./fasta [options] <fasta_path> <region>" 

subprocess.run(your_command, shell=True)

ADD COMMENT • link 11 months ago by Sajad ▴ 90

0

Entering edit mode

you can integrate Bioio in Python by using subprocess module

How are you "integrating" anything using subprocess.run()?

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

That's not integrating that's just starting the C++ program from python :).

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

Exactly. Using fancy words to mislead people is just shady.

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

subprocess is a module that allows you to run other programs or commands from your Python code. It can be used to run programs, send them data, and get results back.

https://realpython.com/python-subprocess/

ADD REPLY • link 11 months ago by Sajad ▴ 90

0

Entering edit mode

Does it "integrate" the fastq program into python? Or does it allow you to run any shell command from python?

and get results back

No. You can set the STDOUT, STDERR and get the return code, but you cannot "get results back" unless you pipe it and then decode the returned object's stdout like shown here: https://stackoverflow.com/a/4760517/1394178

Even to do that, you need to know where the program writes its required output to - STDOUT or STDERR and jump through hoops. Please don't mislead users with inaccurate statements.

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

I just suggested a way to run Bioio with Python. can I use subprocess module?

import subprocess

your_command = "./fasta [options] <fasta_path> <region>"

subprocess.run(your_command, shell=True)

ADD REPLY • link 11 months ago by Sajad ▴ 90

0

Entering edit mode

Please edit your answer and change "integrate" to a version of what you mention now, which is "a way to run bioio from within Python". Like you mention, the method call does need the shell=True option and ideally, one should also capture the return value (exit code) and check if it's 0 (meaning command executed successfully).

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

And besides that this answer still requires a compilation of fasta.cpp to be used. I want get rid of the C++ part/code completely.

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

That's on you, OP - you should have mentioned more clearly that you wish to avoid that tool entirely. When you run ./fasta, you're not running cpp code, you're simply using a tool.

ADD REPLY • link 11 months ago by Ram 44k

0

Entering edit mode

One could argue about that, because to run fasta, I would need to compile the cpp code, which I mentioned that I want to avoid it. But I’ll update the question ;).

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

I know this is pedantic, but given you're already using the fasta executable, you will not need to compile it again. You can't save effort you've already invested so unless you're creating a pipeline to be used on a different machine, compiling is a moot point.

ADD REPLY • link 11 months ago by Ram 44k

1

Entering edit mode

Totally get you and you're right. To be exact, I want others to be able to run the pipeline without having to compile the fasta executable on different machine with different platforms.

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

I'd recommend comparing speed between this tool and a pyfaidx based implemetation then decide on this - you don't want to compromise too much on performance for a task that can be automated as easily as make-ing a simple tool.

ADD REPLY • link 11 months ago by Ram 44k

score 2 · Accepted Answer · 2023-12-20

2

Entering edit mode

11 months ago

cmdcolin ★ 4.0k

a classic command line tool for this is "samtools faidx" and it is available in python as pyfaidx https://pypi.org/project/pyfaidx/

ADD COMMENT • link 11 months ago by cmdcolin ★ 4.0k

0

Entering edit mode

That looks like what I need :). Thanks, I'll look into it.

ADD REPLY • link 11 months ago by gernophil ▴ 90

0

Entering edit mode

That did it for me. Thanks a lot. Only thing I needed to be aware of is casting the pyfaidx object to a string and I was able to get the exact same results as before.

ADD REPLY • link 11 months ago by gernophil ▴ 90