Yeast GTF file needed for RNA-SeQC run
1
0
Entering edit mode
7.9 years ago
Lina F ▴ 200

Hi all,

I have some yeast RNAseq data and I would like to run RNA-SeQC to get an overview of the quality of the run.

I got both the reference fasta sequence and the GTF file from Ensembl here:

# fasta 
ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz

# gtf
ftp://ftp.ensembl.org/pub/current_gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.86.gtf.gz

However, it seems that there is no rRNA information in the GTF file. When I run RNA-SeQC, I get the following output in my log:

RNA-SeQC v1.1.8.1 07/11/14
Creating rRNA Interval List based on given GTF annotations
Retriving contig names from reference
     contig names in reference: 17
Loading GTF for Read Counting
Converting to refGene
Transcript objects to RefGen format:    0 s
java.lang.RuntimeException: No rRNA found in GTF transcript_type field
    at org.broadinstitute.cga.rnaseq.TranscriptList.toRRNAIntervalList(TranscriptList.java:414)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.createRefGeneAndRRNAFiles(RNASeqMetrics.java:1306)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.prepareFiles(RNASeqMetrics.java:196)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.execute(RNASeqMetrics.java:170)
    at org.broadinstitute.cga.rnaseq.RNASeqMetrics.main(RNASeqMetrics.java:139)
No information for rRNA available. Continuing without rRNA calculations. (Using the -BWArRNA flag for best results)
...

I would like the rRNA information if possible. Does anyone know where to get a GTF file with rRNA information for yeast?

Thanks!

qc gtf yeast RNA-Seq RNA-SeQC • 3.4k views
ADD COMMENT
3
Entering edit mode
7.9 years ago
apa@stowers ▴ 610

RNA-SeQC appears to be looking for an annotation field "transcript_type" which does not exist in Ensembl GTFs, Ensembl uses "transcript_biotype" and "gene_biotype".

I think you can use the "transcript.type.field" parameter to specify which GTF field you want to use instead of "transcript_type".

Otherwise, you could run something like "perl -i -pe 's/transcript_biotype/transcript_type/' Saccharomyces_cerevisiae.R64-1-1.86.gtf" to change all instances of "transcript_biotype" to "transcript_type", then RNA-SeQC should recognize it?

ADD COMMENT
0
Entering edit mode

Ah, this makes sense -- thanks for the help!

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6