Hello,
I am working with non-human RNA-seq for the first time and this is with the Rhesus Monkey. I am having very low alignment rates after featureCounts. The out put is like so:
> Load annotation file rheMac10refGene.gtf.sorted ...
Features : 47357
Meta-features : 6331
Chromosomes/contigs : 46
> Process BAM file *****.bam...
Single-end reads are included.
Assign alignments to features...
Total alignments : 84767064
Successfully assigned alignments : 8615189 (10.2%)
Running time : 4.04 minutes
So the steps I've done so far:
Downloaded the entire assembly from ftp://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/*
Converted the 2bit format to .fa using twoBitToFa tool from rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/
-- this is the file, which after indexing was the reference file for HISAT2
- Next, for the GTF file I downloaded the
rheMac10.refGene.txt
and converted it to a GTF file using genePredToGtf tool. Which after cleaning, supplied as the gtf file for featureCounts.
Is this the way to go? My mapping rates after HISAT2 are high and comparable to the human genome ones. What am I doing wrong?
You have low
assignment
rates.Can you provide actual featureCounts command you are using? Did you actually use a matched reference and annotation when you did all your analysis? You can't mix and match those from different sources.
Yes, my reference was also from rheMac10 and the annotation (derived from rheMac10.refGene.txt) is also from the same. -- The reference genome (.fa) is only 2.9G and the GTF file 17M. This is also a part that confuses me. Is something wrong with this? This is the reference file I used
A couple of things to check first. If you have rRNA contamination in the data. It is possible that a lot of your reads are multi-mapping. These are not counted by featureCounts by default. You should also add
-p
(if this was a paired end dataset). You would want to count at exon level, but summarize at gene level (something like-t exon -g gene_id
). I don't recollect immediately if that is the default.