Hi everyone, I am sequencing neurons isolated from rat. I'm working with a limited amount of material, so using Nugen Ovation Universal library kits. The QC using FastQC looks ok. My library is paired-end with 75bp reads. I ran STAR using this command:
STAR --runThreadN 3 \
--genomeDir ../genomes/rn6/Ensembl/star2 \
--readFilesCommand gunzip -c \
--readFilesIn ${R1} ${R2} \
--outFileNamePrefix starMapped/${job_name} \
--outSAMtype BAM Unsorted \
--seedSearchStartLmax 40 \
--outFilterScoreMinOverLread 0.5 \
--outFilterMatchNminOverLread 0.5
My unique mapped reads is ~70-90% for all samples with multi-mapped reads 7-15%.
I used the same GTF file to then make the count table using featureCounts using this command:
featureCounts -T 6 -p -t exon -g gene_name -a ../genomes/rn6/Ensembl/Rattus_norvegicus.Rnor_6.0.93.gtf -o combined_counts.txt *.bam
But my % assigned reads is only 33% and unassigned_multimap reads are ~30% of my reads and unassigned_NoFeature are the other third. I don't understand how the number of multi-mapping reads and mapping can be so different between the two packages given I am using the same GFT file. What's going on here? And why is the number of unassigned reads so high with featureCounts?
I have a similar problem to yours. I also used the Rnor6 gtf annotation from ENSEMBL for alignment and quantification. The report from
read_distribution.py
fromRSeQC
package indicates that ~20-30% reads were aligned to intronic or intergenic regions. You can also try to see your results:read_distribution.py
: http://rseqc.sourceforge.net/#read-distribution-pyI suspect this is because the Rat annotation file is still incomplete compared with human and mouse
You can try StringTie, it will assemble novel transcripts based on the genome, and quantify known and novel transcripts.
Thank you! I have used
stringtie
. But I still want to know that why there are so many reads were aligned to intronic region, is this normal for human data? Are there any way to determine whether this is due to the sequencing data or to my analyzing pipeline?(I just analyzed a group of human data today. Uniquely mapping rate looks well (~90%). While featureCount can only assign ~40% reads successfully. Report from
read_distribution.py
shows that there are about ~20% reads.)Thanks!
You can obtain counts directly from STAR with
--quantMode GeneCounts
, this quantification should be similar to that obtained with featureCounts. You didn't use-s
in your featureCounts command, are you sure the Nugen Ovation results in an unstranded library?Good point (see this, under
Data analysis
) :