No reads assigned to bacterial genome with featureCounts (Rsubread)
1
0
Entering edit mode
5.2 years ago
Mathias_H ▴ 20

Hi everyone, I am programming an RNA-Seq Pipeline in R to process raw read data. I need to use multiple reference genomes from different organisms. For the mapping step I use STAR and everything worked great until the counting step with featureCounts (Rsubread). For one bacterial genome (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/009/445/GCA_000009445.1_ASM944v1) featureCounts is not able to assign any reads.

When I look at the count-matrix I see that there are no gene names, so I guess the problem is that featureCounts is not able to read the annotation file correctly.

I have tried to convert the .gff annotation file into a .gtf file with gffread and to use the .gtf file for counting, but I still do not get any assigned reads. (Also the .gtf file is much smaller (795 kB) than the .gff file (2.2 MB) and I don't know whether I lose important information here.)

As already mentioned in this post the problem may be the origin (NCBI) of the genome files and the ability of featureCounts to process them, but I do not use the build-in annotation ("mm10").

What do you guys think? Do I have to use a different program for the counting step for bacterial genomes?

tmp2 <- featureCounts(filesToCount, annot.ext = annGenbac, isGTFAnnotationFile = TRUE, genome = refGenbac, isPairedEnd = TRUE)

with .gff file:

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
The attributes included in your GTF annotation are 'ID=id1;Parent=rna0;Note=ileT_1%2C len: 74 nt. First of two copies equivalent to ileT%2C len: 74 nt%2C from Mycobacterium tuberculosis strain H37Rv%2C (100.0%25 identity in 74 nt overlap). tRNA-Ile%2C anticodon gat;anticodon=(pos:10921..10923);gbkey=tRNA;gene=tRNA-Ile;product=tRNA-Ile' 

||    Features : 52                                                           ||
||    Meta-features : 1                                                       ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Loading FASTA contigs : /data/mathias/dualRNA-Seq/M_tuber/Genomes/Myco ... ||
||    1 contigs were loaded                                                   ||
||                                                                            ||
|| Process BAM file /data/mathias/dualRNA-Seq/M_tuber/Output/Read_filteri ... ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 160572                                                    ||
||    Successfully assigned reads : 0 (0.0%)                                  ||
||    Running time : 0.01 minutes

with .gtf file:

Features : 52                                                           ||
||    Meta-features : 52                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Loading FASTA contigs : /data/mathias/dualRNA-Seq/M_tuber/Genomes/Myco ... ||
||    1 contigs were loaded                                                   ||
||                                                                            ||
|| Process BAM file /data/mathias/dualRNA-Seq/M_tuber/Output/Read_filteri ... ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 160572                                                    ||
||    Successfully assigned reads : 0 (0.0%)                                  ||
||    Running time : 0.00 minutes
RNA-Seq R featureCounts • 2.1k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
GenoMax 147k

Looks like your gene names are in Name=tRNA-Ala or gene= field. Have you tried to specify those fields with your -g option (you will need to check what the equivalent is for R command since that is what you are using) with featureCounts? Also make sure your reference and thus your alignment file contains AM408590.1 as the chromosome name.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! Yes, I have tried to change the -g option (GTF.featureType in R) to "gene", "Name" and "ID". I still do not get any assigned reads. The headers of my alignment files (.bam) look like that:

    @HD VN:1.6  SO:coordinate
@SQ SN:ENA|AM408590|AM408590.1  LN:4374522
@PG ID:STAR PN:STAR VN:2.7.0e

Do you mean I have to change "ENA|AM408590|AM408590.1" to "AM408590.1"?

ADD REPLY
0
Entering edit mode

That is correct. That identifier has to match exactly the one in annotation file. You can edit the header of the SAM file (if that is what you are using to count).

ADD REPLY
0
Entering edit mode

Yes, that was the problem!! With changed BAM file headers...

@HD VN:1.6  SO:coordinate
@SQ SN:AM408590.1   LN:4374522
@PG ID:STAR PN:STAR VN:2.7.0e

...I finally got some assigned reads.

Thank you very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6