Question

From GTF file to GFF file (searching for all 5'UTRs, multiple UTRs per transcript)

0

Entering edit mode

7.9 years ago

Walter Baumann ▴ 10

Hi, I downloaded the GTF file on the Gencode website and now I want to create a GFF file containing all 5'UTRs, which then will be subsequently used in htSeq.

I have the following problems with writing my code in the command line: - How to obtain the 5'UTRs of each transcript? How to deal with + and - strand? I know that the 5'UTRs are those at the 5' end of the transcript.

There are in several cases with more than 2 UTRs per transcript. What to do with them?

This is the first scaffold of the final gff file, containing all UTRs.

awk '{OFS="\t"; if($3=="UTR"){print $1,$2,$3,$4,$5,".",$7,$10,$12}}' Geneannotation_all.gtf | sed 's/";//g; s/"//g' > Geneannotation_all.UTR.gff

RNA-Seq rna-seq gtf gff • 2.9k views

ADD COMMENT • link updated 7.9 years ago by Jeffin Rockey ★ 1.3k • written 7.9 years ago by Walter Baumann ▴ 10

0

Entering edit mode

From the htseq-count FAQ

I have a GTF file? How do I convert it to GFF? No need to do that, because GTF is a tightening of the GFF format. Hence, all GTF files are GFF files, too. By default, htseq-count expects a GTF file.

For getting the UTRs I would use grep.

There are in several cases with more than 2 UTRs per transcript. What to do with them?

That probably depends on your biological research question, you could consider merging them.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

There is the possibility to download a gtf file containing information about the 5' UTRs per transcript. How can I add the information? (I just find information on the exons, and cdsStart and End)

ADD REPLY • link 7.9 years ago by Walter Baumann ▴ 10

0

Entering edit mode

Sorry, add which information to what?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Sorry. I want to download a gtf and bed file containing the location of the 5'UTR per transcript, so that I can use it in htseq and bedtools for further analysis. I already have an alignment.

ADD REPLY • link 7.9 years ago by Walter Baumann ▴ 10

0

Entering edit mode

GTF/GFF files (which can be converted to bed) are available from Ensembl and contain UTR information. You could filter the file using grep.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2017-02-16

If I understood properly, available is a gtf file with exon and cds info but without UTR coordinates specified as such, which you would like to add.

A twisted and not so nice approach would be as below. But I suppose it would meet you requirement.

i) Download gtfToGenePred and genePredToGtf

ii) gtfToGenePred yourGenemodel.gtf yourGenemodel.gtf.genepred

iii) genePredToGtf -utr file yourGenemodel.gtf.genepred yourGenemodel.WithUtr.gtf

Please give this a try and check.

Addendum:

Also see the below post which resolves a similar requirement.

UTR annotation on top of reference GTF