featureCounts strand specificity
2
7
Entering edit mode
9.1 years ago
tonja.r ▴ 600

I was working with featureCounts and had three runs while changing only one parameter: the strand specificity. I would expect that the reads from -s 1 and the reads from -s 2 would sum up to the the number of reads outputted by -s 0.

featureCounts --minReadOverlap 50 -s 0 -Q 1 -T 12 -a matrix_gene.saf -F SAF -O -o $OUT/total_counts.txt file.bam
featureCounts --minReadOverlap 50 -s 1 -Q 1 -T 12 -a matrix_gene.saf -F SAF -O -o $OUT/total_counts.txt file.bam
featureCounts --minReadOverlap 50 -s 2 -Q 1 -T 12 -a matrix_gene.saf -F SAF -O -o $OUT/total_counts.txt file.bam

However, the results differ:

==> minReadOverlap50_strand1/total_counts.txt.summary <==
Status    WEN1_6558.sort.bam    WNN1_6545.sort.bam WNN2_6550.sort.bam
Assigned    4643945    8863560    8859072

==> minReadOverlap50_strand2/total_counts.txt.summary <==
Status    WEN1_6558.sort.bam    WNN1_6545.sort.bam  WNN2_6550.sort.bam
Assigned    4775184    9123446    9158397

==> minReadOverlap50/total_counts.txt.summary <==
Status   WEN1_6558.sort.bam    WNN1_6545.sort.bam   WNN2_6550.sort.bam
Assigned    8356549    15881043    15933323

4643945+4775184=9419129 != 8356549

Why don't the reads sum up?

alignment • 18k views
ADD COMMENT
8
Entering edit mode
9.1 years ago

You misunderstood the role of the -s option:

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.

When you choose either option 1 or 2, reads mapping on both - and + strands are taken into accounts. So it is not expected to sum up. The difference between -s1 and -s2 is that the strands are reversed in -s2. '-' becomes'+' and '+' becomes '-' and as a consequence, '+' reads are assigned to '-' features.

ADD COMMENT
0
Entering edit mode

I have ChIP-seq data, so I have "+" and "-" reads. I have regions where I want to count my reads. Regions can overlap and can be on "+" and on "-" strands. I do not care about the strands, meaning if any read falls into a feature on "-" strand, it should be counted, if any read falls into a feature on "+" strand, it should be counted. Does it mean that I have to use -s0 and not care about strand specificity?

ADD REPLY
0
Entering edit mode

Exactly ! -s1 or -s2 are mostly used with strand-specific RNA-seq data. With ChIP-seq data you'll usually use -s0.

ADD REPLY
1
Entering edit mode

One small confusion is there, I am using paired-end RNA-seq data and for quantification, Platform for my sequencing was 'Illumina NextSeq 500 (Homo sapiens)' I have used the following command

featureCounts -p -t exon -g gene_id -a hg38ucsc.gtf -o SRRXXX.bam.txt SRRXXX.bam

I want to see counts of both the strand (reverse and forward) in my output, Which option I am supposed to add in my command -s 0 or -s 1 or -s 2 option?

ADD REPLY
0
Entering edit mode

If you want to count on both strand, use -s 0 (which is the default behaviour).

ADD REPLY
1
Entering edit mode
9.1 years ago
OD ▴ 10

Hi,

inspired by the post: A: featurecounts vs samtools view

Maybe your saf file contains overlapping features in different directions. Hence featureCounts will assign reads depending on the -s parameter either to the one or the other feature (i.e. counting the same read twice) but will assign reads only once to one feature in -s 0.

Could you maybe check for those occurrences in your saf file and run featureCounts on only this portion of the file for a test?

ADD COMMENT
0
Entering edit mode

You can also try running featureCounts with the -O option. It will count reads overlapping features once by feature.

ADD REPLY
0
Entering edit mode

I did it (in commands above I have -O option) and still something was wring there.

ADD REPLY
0
Entering edit mode

Yes, my bad, I now know what your issue is. Replying in "answers"

ADD REPLY
0
Entering edit mode

As mentioned in the comment above, I can run the featureCount with -O parameter. And I did run it with this parameter (see commands), so I expected that with -s 0 the reads would be counted for each feature, even it they overlap. I also had a test example with two different features on negative and positive strand overlapping one read. With -O and -s 0 the read was counted twice: once for positive and once for negative strand

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

hi, with -O and -s 0, the read was counted twice, but if with -O and -s 1, the read will be counted once?

ADD REPLY

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6