How to identify 3'UTR region?
1
0
Entering edit mode
4.8 years ago
2822462298 ▴ 120

I need to find 3'-UTR region for each gene to predict miRNA targets but I only have a gff file without any UTR regions. Is there a tool I can use to identify 3'UTR regions?

(genome and RNA-seq data available)

Here is an example of my current gff file (CDS and exon start and end with the same position as gene and mRNA, no UTR)

Bany_Scaf21 B_anynana_v2    gene    6013946 6020193 .   -   .   ID=BANY.1.2.t00009.path1;Name=BANY.1.2.t00009
Bany_Scaf21 B_anynana_v2    mRNA    6013946 6020193 .   -   .   ID=BANY.1.2.t00009.mrna1;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.path1;coverage=99.1;identity=100.0;matches=229;mismatches=0;indels=0;unknowns=0
Bany_Scaf21 B_anynana_v2    exon    6020095 6020193 100 -   .   ID=BANY.1.2.t00009.mrna1.exon1;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 3 101 +
Bany_Scaf21 B_anynana_v2    exon    6016799 6016862 100 -   .   ID=BANY.1.2.t00009.mrna1.exon2;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 102 165 +
Bany_Scaf21 B_anynana_v2    exon    6014273 6014317 100 -   .   ID=BANY.1.2.t00009.mrna1.exon3;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 166 210 +
Bany_Scaf21 B_anynana_v2    exon    6013946 6013966 100 -   .   ID=BANY.1.2.t00009.mrna1.exon4;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 211 231 +
Bany_Scaf21 B_anynana_v2    CDS 6020095 6020192 100 -   0   ID=BANY.1.2.t00009.mrna1.cds1;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 4 101 +
Bany_Scaf21 B_anynana_v2    CDS 6016799 6016862 100 -   2   ID=BANY.1.2.t00009.mrna1.cds2;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 102 165 +
Bany_Scaf21 B_anynana_v2    CDS 6014273 6014317 100 -   0   ID=BANY.1.2.t00009.mrna1.cds3;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 166 210 +
Bany_Scaf21 B_anynana_v2    CDS 6013946 6013966 100 -   0   ID=BANY.1.2.t00009.mrna1.cds4;Name=BANY.1.2.t00009;Parent=BANY.1.2.t00009.mrna1;Target=BANY.1.2.t00009 211 231 +
RNA-Seq mRNA transcriptome genome • 1.7k views
ADD COMMENT
0
Entering edit mode

Which genome/organism is this?

ADD REPLY
0
Entering edit mode

Hi Genomax, it's Bicyclus anynana (butterfly), a non-model organism. I don't think I can download it from the database.

ADD REPLY
0
Entering edit mode

Hi,

it's more a vague idea: You can try to find the PAS-signal and define the region between it and the CDS as 3' UTR. The signal is usually an A-rich hexamer. You may model the genome with a similar species' 3' UTR-length distribution to limit the search space.

Cheers,

Michael

ADD REPLY
2
Entering edit mode
4.8 years ago

The UTRs are basically the exons minus the CDS.

So you could use bedtools subtract to subtract the CDS from the exons (get all the exons in one file, and all the CDSs in another and then subtract the second from the first). However, this would a) Give you both 5' and 3' UTRs b) It could possibly subtract the CDS of one transcript from the exons of another.

When I've done this in the past, i've found the easiest way to do with was with some custom code that finds the end point of the CDS for a transcript and finds exons that end after the end point of the CDS, and then truncates the first one to start at the CDS end. My code for this will only work with GTF though - I don't actually have pasing code for GFF3 I don't think.

ADD COMMENT
0
Entering edit mode

Hi, I understand your methodology but in my case, exons and CDS are identical...the gff file is from one of my labmates and he did not include the UTR in exon at all. I may need a tool to excise them from the genome.

ADD REPLY
1
Entering edit mode

Do you know how the annotation was derived?

My immediate feelings are you have three choices: 1) Try to look for PAS sequences as suggested by @micheal.ante above. 2) Try to align the transcript sequences of the closest relative that does have UTRs annotated. 3) Use RNAseq data if available.

ADD REPLY

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6