Question

featureCounts --primary tag counting uniquely mapped reads and excludes multi-mapped reads?

3

Entering edit mode

5.5 years ago

solyris83 ▴ 40

Hi,

I am doing a featureCounts on 4 RNA-seq samples, code as found below. I am counting on the "gene" from gencode v33 as seen in the gtf file. Reason why I am doing this is to counter-check the counting process against the aligner's (STAR in this case) summary output.

 featureCounts -a gencode.v33.primary_assembly.annotation.gtf -o out.txt -O --fraction -B -t "gene" -s 2 -T 8 --primary -p 504-709_S21Aligned.sortedByCoord.out.bam 504-710_S22Aligned.sortedByCoord.out.bam 505-702_S26Aligned.sortedByCoord.out.bam 505-703_S27Aligned.sortedByCoord.out.bam

Am I right to assume that the above code will correspond to the uniquely mapped reads summary in STAR? The counting above is very close but not exactly same to the reported uniquely mapped reads from STAR, hence my assumption.

If so, what does the --primary tag do in this case, as I assume it will count the multi-mapped reads which is given a primary tag.

Regards Solyris

featureCounts STAR • 5.7k views

ADD COMMENT • link updated 6 months ago by rohitsatyam102 ▴ 940 • written 5.5 years ago by solyris83 ▴ 40

score 1 · Answer 1 · 2020-02-04

1

Entering edit mode

5.5 years ago

Tomás Di Domenico ▴ 30

featureCounts will not count multi-mappers by default. In your case, with "--primary" specified, it will count each multi-mapping read once (it counts the alignment marked as primary and ignores all the rest, which are marked as secondary). So your command as posted should differ from STAR's uniquely mapped reads slightly. For ignoring multi-mappers altogether, simply remove the "--primary" parameter.

You may also see slight differences in counting depending on how the different programs handle overlaps when counting, for example.

ADD COMMENT • link 5.5 years ago by Tomás Di Domenico ▴ 30

2

Entering edit mode

This

with "--primary" specified, it will count each multi-mapping read once

is not entirely true (at least for featureCounts v2.0.0). featureCounts recognizes multimapped reads using the NH:i tag. If you use --primary without -M featureCounts will count only uniquely mapped reads (NH:i:1) and completely ignore multimapped reads regardless on the SAM primary tag. Using --primary in this scenario doesn't have any effect on counting whatsoever.

In case the aligner doesn't output the NH:i flag, featureCounts doesn't recognize unique/multimapped and will count all the alignments. In this case, using --primary will help you to count each read only once.

If you want to count both uniquely and multimapped reads (and your aligner outputs the NH:i flag) but each of the reads only once you can use --primary -M.

ADD REPLY • link 4.0 years ago by opplatek ▴ 300

0

Entering edit mode

It appears that --primary usage with -M sets off the -M parameter off as per the description here:

logical indicating if only primary alignments should be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. If TRUE, all primary alignments in a dataset will be counted no matter they are from multi-mapping reads or not (ie. countMultiMappingReads is ignored).

So would you recommend using --fraction -M rather than --primary -M. I confirmed this by checking the counts which were not fraction at all when using --fraction --primary -M together.

PS: I am new to featureCounts so please help me here.

ADD REPLY • link 6 months ago by rohitsatyam102 ▴ 940