Hi,
I am doing a featureCounts on 4 RNA-seq samples, code as found below. I am counting on the "gene" from gencode v33 as seen in the gtf file. Reason why I am doing this is to counter-check the counting process against the aligner's (STAR in this case) summary output.
featureCounts -a gencode.v33.primary_assembly.annotation.gtf -o out.txt -O --fraction -B -t "gene" -s 2 -T 8 --primary -p 504-709_S21Aligned.sortedByCoord.out.bam 504-710_S22Aligned.sortedByCoord.out.bam 505-702_S26Aligned.sortedByCoord.out.bam 505-703_S27Aligned.sortedByCoord.out.bam
Am I right to assume that the above code will correspond to the uniquely mapped reads summary in STAR? The counting above is very close but not exactly same to the reported uniquely mapped reads from STAR, hence my assumption.
If so, what does the --primary tag do in this case, as I assume it will count the multi-mapped reads which is given a primary tag.
Regards Solyris
This
is not entirely true (at least for featureCounts v2.0.0). featureCounts recognizes multimapped reads using the
NH:i
tag. If you use--primary
without-M
featureCounts will count only uniquely mapped reads (NH:i:1) and completely ignore multimapped reads regardless on the SAM primary tag. Using--primary
in this scenario doesn't have any effect on counting whatsoever.In case the aligner doesn't output the
NH:i
flag, featureCounts doesn't recognize unique/multimapped and will count all the alignments. In this case, using--primary
will help you to count each read only once.If you want to count both uniquely and multimapped reads (and your aligner outputs the
NH:i
flag) but each of the reads only once you can use--primary -M
.