Question

How to retrive sequnce from fasta file by using start and end point data from xl?

0

Entering edit mode

4.8 years ago

akashbala0 ▴ 10

Hi! I have an excel file with thousands of chromosome names with transcription start endpoint. Someone, please develop a python program from where I can retrieve the sequence from genome file according to the start and endpoint mentioned in excel.

excel file looks like

    chromosome  start   end
    KB317696.1  1361    1376
    KB317696.1  1594    1929
    KB317697.1  2033    2101
    KB317697.1  2159    2265
    KB317698.1  2319    2421
    KB317699.1  2513    2736
    KB317700.1  2789    2903
    KB317700.1  3157    3279

python biopython • 2.1k views

ADD COMMENT • link updated 4.8 years ago by GouthamAtla 12k • written 4.8 years ago by akashbala0 ▴ 10

1

Entering edit mode

please develop a python program

That is not what the forum is here for. There is an expectation that you demonstrate some effort toward solving the problem yourself first. Moreover, this is not an uncommon task, so please search the forum, there will undoubtedly be existing solutions you can try.

ADD REPLY • link 4.8 years ago by Joe 22k

0

Entering edit mode

this can be done in the following steps: - import the sequence using Biopython (SeqIO.read()) - import the excel file as table using pandas, subset the table to only keep the start and stop positions - go through the columns of this table in a for loop, splice the sequence using start and stop column entries (example : seq_output = sequence[start_position:end_position]

P.S - python's index starts from 0 so your start_position should really be start_position+1

ADD REPLY • link 4.8 years ago by manaswwm ▴ 560

0

Entering edit mode

duplicate: How to use Bed file to extract sequence from FASTA file? ; Extract several sequences from genome in FASTA format with genomic coordinates. ; how to quickly extract sequence from genome positions ; ... etc... etc...

ADD REPLY • link 4.8 years ago by Pierre Lindenbaum 165k

score 0 · Answer 1 · 2020-07-05

0

Entering edit mode

4.8 years ago

GouthamAtla 12k

Just to close this post, you can check getFatsa or Biopython documentation if you are into python. No need to reinvent the wheel. A simple google search would have given you a quicker solution. And BED file is standard file format for storing such data.

ADD COMMENT • link 4.8 years ago by GouthamAtla 12k