Question

How to extract 3', 5' UTR sequences from genbank records using python, PERL and R code?

0

Entering edit mode

5.3 years ago

mathavanbioinfo ▴ 80

Hello All, I have 1000 sequences of genebank records, I want to extract the only 3'UTR, 5'UTR sequences from the sequences and to store in excel format. Share your ideas and suggestion [using PERL or Python or R codes]

UTR • 2.4k views

ADD COMMENT • link updated 5.3 years ago by zubenel ▴ 120 • written 5.3 years ago by mathavanbioinfo ▴ 80

0

Entering edit mode

Hi, please post a sample gbk file and define the headers that you want to see in your output file (Ex: seqID, locusTag, sequence ... ).

ADD REPLY • link 5.3 years ago by hugo.avila ▴ 540

0

Entering edit mode

Please take a look at the biopython cookbook and tutorial.

ADD REPLY • link 5.3 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2019-12-29

0

Entering edit mode

5.3 years ago

padwalmk ▴ 140

Hi, It's unclear wither you have the gff file with you or fasta.

You can look in to following post

Extract coordinates of upstream region up to closest coding region in R

ADD COMMENT • link 5.3 years ago by padwalmk ▴ 140

score 0 · Answer 2 · 2019-12-29

0

Entering edit mode

5.3 years ago

zubenel ▴ 120

If you have gff file you might try to use gff2fasta.pl with option -feature set as "five_prime_UTR" or "three_prime_UTR" or something like that. Also you may read how to get sequences of specific features with BioPerl.

ADD COMMENT • link 5.3 years ago by zubenel ▴ 120