Entering edit mode
7.4 years ago
xqyjxau
▴
50
Hi!Talents! Here are my question:I used my RNA-Seq data(fastq format of human sample). I chose the GRCh38 and GRCh38.89.gtf for STAR reference genome indexing. And later I use featureCounts to count the reads(same gtf file of GRCh38.89.gtf). Here is my command:
featureCounts -T 5 -a <gtf file location> -t exon -g gene_id -o tableCounts *.sam.
And I seem to get extremely low assigned reads: from 18% to 20%.Where is something wrong there?
//================================= Running ==================================\\
|| ||
|| Load annotation file
|| Features : 1193949 ||
|| Meta-features : 58233 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process SAM file Sample1. ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
|| Total reads : 20566972 ||
|| Successfully assigned reads : 4086564 (19.9%) ||
|| Running time : 0.10 minutes ||
|| ||
|| Process SAM file Sample2_12AH_0022_Aligned.out.sam... ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
|| Total reads : 22822591 ||
|| Successfully assigned reads : 4176597 (18.3%) ||
|| Running time : 0.11 minutes ||
|| ||
|| Process SAM file Sample4_11NH_0240_tumor_Aligned.out.sam... ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
|| Total reads : 19564290 ||
|| Successfully assigned reads : 2993070 (15.3%) ||
|| Running time : 0.10 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
|| Summary of counting results can be found in file "tableCounts" ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
View your file in IGV and see where your reads align to. If many reads align outside of coding regions then that will be a problem.
Do also a flagstat on your alignment file to see how many reads align overall. That again can be an explanation for the low count.
if "many reads align outside of coding regions" should I proceed with what ever counts I get or should I discard the sample?
If you are using STAR to map, you can get the counts directly from it by using
--quantMode GeneCounts
.Was the sequencing done strand-specific?
No.It's not strand-specific.