Entering edit mode
5.6 years ago
gokberk
▴
90
Hi all,
I've been trying to read some bam files using Rsubread. As for the annotation file, I converted a gff file to gtf using gffread
. Here is how my gtf file looks like:
NC_012670.1 RefSeq exon 543 614 . + . transcript_id "rna76575";
NC_012670.1 RefSeq exon 615 1562 . + . transcript_id "rna76576";
NC_012670.1 RefSeq exon 1563 1631 . + . transcript_id "rna76577";
NC_012670.1 RefSeq exon 1632 3191 . + . transcript_id "rna76578";
NC_012670.1 RefSeq exon 3192 3266 . + . transcript_id "rna76579";
NC_012670.1 RefSeq CDS 3269 4223 . + 0 transcript_id "gene33355"; gene_id "gene33355"; gene_name "ND1";
NC_012670.1 RefSeq exon 4224 4291 . + . transcript_id "rna76580";
NC_012670.1 RefSeq exon 4289 4360 . - . transcript_id "rna76581";
NC_012670.1 RefSeq exon 4362 4429 . + . transcript_id "rna76582";
NC_012670.1 RefSeq CDS 4430 5471 . + 0 transcript_id "gene33356"; gene_id "gene33356"; gene_name "ND2";
NC_012670.1 RefSeq exon 5472 5538 . + . transcript_id "rna76583";
NC_012670.1 RefSeq exon 5546 5614 . - . transcript_id "rna76584";
NC_012670.1 RefSeq exon 5616 5688 . - . transcript_id "rna76585";
NC_012670.1 RefSeq exon 5721 5782 . - . transcript_id "rna76586";
NC_012670.1 RefSeq exon 5783 5853 . - . transcript_id "rna76587";
NC_012670.1 RefSeq CDS 5858 7396 . + 0 transcript_id "gene33357"; gene_id "gene33357"; gene_name "COX1";
NC_012670.1 RefSeq exon 7398 7469 . - . transcript_id "rna76588";
NC_012670.1 RefSeq exon 7471 7538 . + . transcript_id "rna76589";
NC_012670.1 RefSeq CDS 7540 8223 . + 0 transcript_id "gene33358"; gene_id "gene33358"; gene_name "COX2";
NC_012670.1 RefSeq exon 8297 8362 . + . transcript_id "rna76590";
NC_012670.1 RefSeq CDS 8364 8570 . + 0 transcript_id "gene33359"; gene_id "gene33359"; gene_name "ATP8";
NC_012670.1 RefSeq CDS 8525 9205 . + 0 transcript_id "gene33360"; gene_id "gene33360"; gene_name "ATP6";
NC_012670.1 RefSeq CDS 9205 9988 . + 0 transcript_id "gene33361"; gene_id "gene33361"; gene_name "COX3";
So, I run the following command in R:
rsubread.counts<-featureCounts(files=E08_bam,
annot.ext="/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",
isGTFAnnotationFile = TRUE,
GTF.featureType="exon",
GTF.attrType="gene_id",
useMetaFeatures = TRUE,
allowMultiOverlap = FALSE)
And receive the following error:
//================================= Running ==================================\\
|| ||
|| Load annotation file Macaca_fascicularis_5.0_genomic.gtf ... ||
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
An example of attributes included in your GTF annotation is 'transcript_id "id177006"; gene_name "LOC102131469";'
The program has to terminate.
Error in featureCounts(files = E08_bam, annot.ext = "/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf", :
No counts were generated.
I was wondering what exactly is the problem with my gtf file (apparently gene_id is missing for some rows, but is that it) and how could I fix it. I'd be more than happy if you could help me.
Cheers, Gökberk
Dear Goekberg, your GTF doesn't seem to have gene annotation at all - only CDS and exons (as seen in col 3). Besides the exons not having
gene_ids
, the exons don't match the CDS ranges at all. This seems very odd. In addition, exons and CDS are second or third level annotation - a gene should be at the lowest level. I'd suggest you review your gff file, either the conversion was wrong or the GFF is invalid in the first place.