Annotation problem while using Rsubread
0
1
Entering edit mode
5.6 years ago
gokberk ▴ 90

Hi all,

I've been trying to read some bam files using Rsubread. As for the annotation file, I converted a gff file to gtf using gffread. Here is how my gtf file looks like:

NC_012670.1 RefSeq  exon    543 614 .   +   .   transcript_id "rna76575";
NC_012670.1 RefSeq  exon    615 1562    .   +   .   transcript_id "rna76576";
NC_012670.1 RefSeq  exon    1563    1631    .   +   .   transcript_id "rna76577";
NC_012670.1 RefSeq  exon    1632    3191    .   +   .   transcript_id "rna76578";
NC_012670.1 RefSeq  exon    3192    3266    .   +   .   transcript_id "rna76579";
NC_012670.1 RefSeq  CDS 3269    4223    .   +   0   transcript_id "gene33355"; gene_id "gene33355"; gene_name "ND1";
NC_012670.1 RefSeq  exon    4224    4291    .   +   .   transcript_id "rna76580";
NC_012670.1 RefSeq  exon    4289    4360    .   -   .   transcript_id "rna76581";
NC_012670.1 RefSeq  exon    4362    4429    .   +   .   transcript_id "rna76582";
NC_012670.1 RefSeq  CDS 4430    5471    .   +   0   transcript_id "gene33356"; gene_id "gene33356"; gene_name "ND2";
NC_012670.1 RefSeq  exon    5472    5538    .   +   .   transcript_id "rna76583";
NC_012670.1 RefSeq  exon    5546    5614    .   -   .   transcript_id "rna76584";
NC_012670.1 RefSeq  exon    5616    5688    .   -   .   transcript_id "rna76585";
NC_012670.1 RefSeq  exon    5721    5782    .   -   .   transcript_id "rna76586";
NC_012670.1 RefSeq  exon    5783    5853    .   -   .   transcript_id "rna76587";
NC_012670.1 RefSeq  CDS 5858    7396    .   +   0   transcript_id "gene33357"; gene_id "gene33357"; gene_name "COX1";
NC_012670.1 RefSeq  exon    7398    7469    .   -   .   transcript_id "rna76588";
NC_012670.1 RefSeq  exon    7471    7538    .   +   .   transcript_id "rna76589";
NC_012670.1 RefSeq  CDS 7540    8223    .   +   0   transcript_id "gene33358"; gene_id "gene33358"; gene_name "COX2";
NC_012670.1 RefSeq  exon    8297    8362    .   +   .   transcript_id "rna76590";
NC_012670.1 RefSeq  CDS 8364    8570    .   +   0   transcript_id "gene33359"; gene_id "gene33359"; gene_name "ATP8";
NC_012670.1 RefSeq  CDS 8525    9205    .   +   0   transcript_id "gene33360"; gene_id "gene33360"; gene_name "ATP6";
NC_012670.1 RefSeq  CDS 9205    9988    .   +   0   transcript_id "gene33361"; gene_id "gene33361"; gene_name "COX3";

So, I run the following command in R:

rsubread.counts<-featureCounts(files=E08_bam,
                               annot.ext="/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",
                               isGTFAnnotationFile = TRUE,
                               GTF.featureType="exon",
                               GTF.attrType="gene_id",
                               useMetaFeatures = TRUE,
                               allowMultiOverlap = FALSE)

And receive the following error:

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Macaca_fascicularis_5.0_genomic.gtf ...               ||

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
An example of attributes included in your GTF annotation is 'transcript_id "id177006"; gene_name "LOC102131469";' 
The program has to terminate.

Error in featureCounts(files = E08_bam, annot.ext = "/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",  : 
  No counts were generated.

I was wondering what exactly is the problem with my gtf file (apparently gene_id is missing for some rows, but is that it) and how could I fix it. I'd be more than happy if you could help me.

Cheers, Gökberk

annotation gtf gff • 2.4k views
ADD COMMENT
1
Entering edit mode

Dear Goekberg, your GTF doesn't seem to have gene annotation at all - only CDS and exons (as seen in col 3). Besides the exons not having gene_ids, the exons don't match the CDS ranges at all. This seems very odd. In addition, exons and CDS are second or third level annotation - a gene should be at the lowest level. I'd suggest you review your gff file, either the conversion was wrong or the GFF is invalid in the first place.

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6