Question

how to extract the gene sequence according to its coordinates on the reference genome?

0

Entering edit mode

5.6 years ago

zhangdengwei ▴ 210

hi, how to extract the gene sequence according to its coordinates on the reference genome? Thanks!

sequence • 2.6k views

ADD COMMENT • link 5.6 years ago by zhangdengwei ▴ 210

0

Entering edit mode

What have you tried? I would suggest using BioPython and string slicing notation of which there are many examples on the forum.

You also haven’t told us what format your data is in.

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

this is a very briefly formulated question ! perhaps read this first : How To Ask Good Questions On Technical And Scientific Forums

What kind of input files do you have? do you want to do this for a single gene, multiple genes, all genes ... ? Do take the effort to include a bit more info to get more suitable answers.

ADD REPLY • link 5.6 years ago by lieven.sterck 15k

0

Entering edit mode

And do you want the whole gene (introns and exons), cDNA, CDS, protein sequences?

ADD REPLY • link 5.6 years ago by Emily 24k

0

Entering edit mode

I am sorry I don't state my question clearly. I am writing a python script to integrate my pipeline, and there is a step which I need to get the DNA sequence by a random pair of start and end position from a quite large FASTA file. So what I want to ask is just which tool or approach can handle it quickly, biopython or else?

ADD REPLY • link 5.6 years ago by zhangdengwei ▴ 210

0

Entering edit mode

if it's within a python pipeline / project, then yes likely biopython is the more sensible option. Otherwise you could for instance also get this through blast if you have a blastdb formatted version of your fasta file

ADD REPLY • link 5.6 years ago by lieven.sterck 15k

0

Entering edit mode

Take a look at my code here https://github.com/jrjhealey/bioinfo-tools/blob/master/Genbank_slicer.py

The same approach would work for fasta files as well as Genbanks etc.

ADD REPLY • link 5.6 years ago by Joe 21k

1

Entering edit mode

Thank you very much for your time!

ADD REPLY • link 5.6 years ago by zhangdengwei ▴ 210

score 0 · Answer 1 · 2019-04-17

0

Entering edit mode

5.6 years ago

zhangdengwei ▴ 210

I found a python module named "pyfaidx" which could satisfy my needs. It can make things simple which fetch sequence from a FASTA file. And here is the link https://pypi.org/project/pyfaidx/#description