Dear Community,
I have been running featureCounts to count mapped reads in RNA-Seq paired-end data with the following parameters
featureCounts -T 10 -t exon -s 2 -p -g gene_id -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam
I also desired to find the biotype associated so I ran featureCounts using the following command
featureCounts -T 10 -t exon -s 2 -p -g gene_biotype -a <path_to_annotation.gtf> -o <path_to_outoput_directory> mapped.bam
comparing the MultiQC from the output of both commands resulted in contradictory output, with less number of reads assigned using Gene_Id in comparison to Gene_biotype.
Gene_Biotype Assignment by number of Reads
Gene_Biotype Assignment by Percentage
Gene_Id Assignment by number of Reads
Gene_Id Assignment by Percentage
My assumption was that in both cases the number of reads assigned will be same and then I can deduce the gene biotype for those assigned reads.
what am I missing here?
thanks in advance.
makes sense. is there a way to work around this situation using featureCounts?
the only way I can think of is using the following code
Yes. This, or something like it is what I would recommend.