Dear all, I use Subread in Bash for mapping and FeatureCounts for count quantification for my RNASeq data. I have used the same pipeline for other RNASeq datasets in the past and FeatureCounts gives count output for all the genes in the annotation file.
Recently I performed RNASeq 150 bp paired-ends reads in rice (Illumina). However, this time, after running featureCounts the count output file contained only 24000 genes out of total 42000 genes in the annotation file. I cannot understand why is it so? Does it suggest that mapping failed for the 18000 ignored genes? Does it suggest PCR amplification-based duplicated reads?
Can someone explain why is it so?
Regards.
Hi, Did you finally find out why it is so? I am running into the same problem with my own data and I can't find where this comes from...
do you mean that those genes are completely missing from the output or that they have 0 count ?
Thank you for the response. They are completely missing from the output. I also ran HTSeq-counts and it also did not provide the count data for all the genes.
Please provide code.
Here it is:
Subread index: subread-buildindex -F -o subreadindex ricegen.fa
Annotation file: gffread -E riceann.gff3 -T -o riceann.gtf
Subread mapping: subread-align -a riceann.gtf -F GTF -T 8 --sortReadsByCoordinates --multiMapping -i subreadindex -t 0 -r Q1read1.fq.gz -R Q1read2.fq.gz -o Q1.bam
FeatureCounts: featureCounts -p -T 8 -G ricegen.fa -t exon -g gene_id -a riceann.gtf -F GTF -M -O --fraction -o STRcounts.txt Q1.bam Q2.bam Q3.bam V1.bam V2.bam V3.bam
HTSeq-counts: htseq-count -t exon -i gene_id -f bam -m intersection-nonempty Q1.bam Q2.bam Q3.bam V1.bam V2.bam V3.bam riceann.gtf > result_file.txt
ricegen.fa and riceann.gtf are the genome and annotation files used.
Did you check the chromosome names in reference vs GTF?
Yes, the names are the same. Anyways it would have been highlighted while running subread if it was not the same.
is RNAseq library prep strand specific?
Yes, it is strand-specific.
How is read coverage for missing genes per exon (for eg take one and check)? Please also post featurecounts summary, if possible.
Thanks, I will check the bigwig on the genome browser and let you know.
Here is the FeatureCounts output screenshot image for the 6 bam files as input in the code above: https://ibb.co/PFf9WxQ
I also checked the read coverage for gene IDs missing from the FeatureCounts output vs annotation file. Here is the screenshot image: https://ibb.co/SmD3cK2
Can you also post the snapshot of the gtf file?
Please find it here
https://ibb.co/KK1kwb3
Hi viDub, Unfortunately not. Since I have little time left for this work, I shifted my analysis from SubRead to Salmon based reference transcriptome analysis where I do not need the FeatureCounts or HTSeqCount. I am sorry, but I cannot help you anymore at this point in time. Regards.