Weird discrepancy of assigned reads in featureCounts when run on paired-end data

Entering edit mode

8.0 years ago

Michael 55k

I have noticed a weird discrepancy between the output of featureCounts when run in paired-end mode vs. single end mode on a paired-end sample.

When comparing the percentage of assigned features I get 45% assigned reads for single-end, which seems to be ok from other posts][1], vs. 5% for paired end mode, the only difference between the two count runs being parameter -p. I have seen a few posts like this one , reporting low assignment rate, and they mostly seem to be on paired end data, answers seem to suggest that something is wrong with the settings or protocol. However, what if that is not the case?

My data seems ok, with ~100% properly paired reads

Single end counting command:

featureCounts -T64  -s 1 -M -a /export/jonassenfs/michaeld/licebase/genomedata/lsal76ribo/lsalGM.gff3 -o featurecounts-test.txt -g Parent -t exon  -B -R 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam > featurecounts.log

Paired-end counting command on the same file yielding only 5% assigned reads:

featureCounts -T64 -p -s 1 -M -a /export/jonassenfs/michaeld/licebase/genomedata/lsal76ribo/lsalGM.gff3 -o featurecounts-test-p.txt -g Parent -t exon  -B -R 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam > featurecounts-p.log

Here is the output of samtools flagstat, showing 100% paired reads:

167713978 + 0 in total (QC-passed reads + QC-failed reads)
3459569 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
167713978 + 0 mapped (100.00% : N/A)
164254409 + 0 paired in sequencing
82127291 + 0 read1
82127118 + 0 read2
164254236 + 0 properly paired (100.00% : N/A)
164254236 + 0 with itself and mate mapped
173 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

	Run in single end mode:
	========== _____ _ _ ____ _____ ______ _____
	===== / ____\| \| \| \| _ \\| __ \\| ____\| /\ \| __ \
	===== \| (___ \| \| \| \| \|_) \| \|__) \| \|__ / \ \| \| \| \|
	==== \___ \\| \| \| \| _ <\| _ /\| __\| / /\ \ \| \| \| \|
	==== ____) \| \|__\| \| \|_) \| \| \ \\| \|____ / ____ \\| \|__\| \|
	========== \|_____/ \____/\|____/\|_\| \_\______/_/ \_\_____/
	v1.5.2

	//========================== featureCounts setting ===========================\
	\|\| \|\|
	\|\| Input files : 1 BAM file \|\|
	\|\| P 10_S48_R1_001.fastq.gzAligned.sortedByCoor ... \|\|
	\|\| \|\|
	\|\| Output file : featurecounts-test.txt \|\|
	\|\| Summary : featurecounts-test.txt.summary \|\|
	\|\| Annotation : /export/jonassenfs/michaeld/licebase/genomed ... \|\|
	\|\| Assignment details : <input_file>.featureCounts \|\|
	\|\| (Note that files are saved to the output directory) \|\|
	\|\| \|\|
	\|\| Dir for temp files : ./ \|\|
	\|\| \|\|
	\|\| Threads : 16 \|\|
	\|\| Level : meta-feature level \|\|
	\|\| Paired-end : no \|\|
	\|\| Strand specific : stranded \|\|
	\|\| Multimapping reads : counted \|\|
	\|\| Multi-overlapping reads : not counted \|\|
	\|\| Min overlapping bases : 1 \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//

	//================================= Running ==================================\
	\|\| \|\|
	\|\| Load annotation file /export/jonassenfs/michaeld/licebase/genomedata/l ... \|\|
	\|\| Features : 59324 \|\|
	\|\| Meta-features : 13594 \|\|
	\|\| Chromosomes/contigs : 2193 \|\|
	\|\| \|\|
	\|\| Process BAM file 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam... \|\|
	\|\| Paired-end reads are included. \|\|
	\|\| Assign reads to features... \|\|
	\|\| Total reads : 167713978 \|\|
	\|\| Successfully assigned reads : 75979398 (45.3%) \|\|
	\|\| Running time : 14.17 minutes \|\|
	\|\| \|\|
	\|\| Read assignment finished. \|\|
	\|\| \|\|
	\|\| Summary of counting results can be found in file "featurecounts-test.txt" \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//


	===== Run in paired-end mode:

	========== _____ _ _ ____ _____ ______ _____
	===== / ____\| \| \| \| _ \\| __ \\| ____\| /\ \| __ \
	===== \| (___ \| \| \| \| \|_) \| \|__) \| \|__ / \ \| \| \| \|
	==== \___ \\| \| \| \| _ <\| _ /\| __\| / /\ \ \| \| \| \|
	==== ____) \| \|__\| \| \|_) \| \| \ \\| \|____ / ____ \\| \|__\| \|
	========== \|_____/ \____/\|____/\|_\| \_\______/_/ \_\_____/
	v1.5.2

	//========================== featureCounts setting ===========================\
	\|\| \|\|
	\|\| Input files : 1 BAM file \|\|
	\|\| P 10_S48_R1_001.fastq.gzAligned.sortedByCoor ... \|\|
	\|\| \|\|
	\|\| Output file : featurecounts-test-p.txt \|\|
	\|\| Summary : featurecounts-test-p.txt.summary \|\|
	\|\| Annotation : /export/jonassenfs/michaeld/licebase/genomed ... \|\|
	\|\| Assignment details : <input_file>.featureCounts \|\|
	\|\| (Note that files are saved to the output directory) \|\|
	\|\| \|\|
	\|\| Dir for temp files : ./ \|\|
	\|\| \|\|
	\|\| Threads : 16 \|\|
	\|\| Level : meta-feature level \|\|
	\|\| Paired-end : yes \|\|
	\|\| Strand specific : stranded \|\|
	\|\| Multimapping reads : counted \|\|
	\|\| Multi-overlapping reads : not counted \|\|
	\|\| Min overlapping bases : 1 \|\|
	\|\| \|\|
	\|\| Chimeric reads : counted \|\|
	\|\| Both ends mapped : required \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//

	//================================= Running ==================================\
	\|\| \|\|
	\|\| Load annotation file /export/jonassenfs/michaeld/licebase/genomedata/l ... \|\|
	\|\| Features : 59324 \|\|
	\|\| Meta-features : 13594 \|\|
	\|\| Chromosomes/contigs : 2193 \|\|
	\|\| \|\|
	\|\| Process BAM file 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam... \|\|
	\|\| Paired-end reads are included. \|\|
	\|\| Assign fragments (read pairs) to features... \|\|
	\|\| \|\|
	\|\| WARNING: reads from the same pair were found not adjacent to each \|\|
	\|\| other in the input (due to read sorting by location or \|\|
	\|\| reporting of multi-mapping read pairs). \|\|
	\|\| \|\|
	\|\| Read re-ordering is performed. \|\|
	\|\| \|\|
	\|\| Total fragments : 83857077 \|\|
	\|\| Successfully assigned fragments : 4319884 (5.2%) \|\|
	\|\| Running time : 17.50 minutes \|\|
	\|\| \|\|
	\|\| Read assignment finished. \|\|
	\|\| \|\|
	\|\| Summary of counting results can be found in file "featurecounts-test-p.tx \|\|
	\|\| t" \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//


	STAR Log final out:

	Started job on \| May 31 15:17:16
	Started mapping on \| May 31 15:17:16
	Finished on \| May 31 15:38:34
	Mapping speed, Million of reads per hour \| 246.94

	Number of input reads \| 87662433
	Average input read length \| 148
	UNIQUE READS:
	Uniquely mapped reads number \| 80934353
	Uniquely mapped reads % \| 92.33%
	Average mapped length \| 148.30
	Number of splices: Total \| 18880378
	Number of splices: Annotated (sjdb) \| 14710103
	Number of splices: GT/AG \| 18348626
	Number of splices: GC/AG \| 39184
	Number of splices: AT/AC \| 3555
	Number of splices: Non-canonical \| 489013
	Mismatch rate per base, % \| 0.38%
	Deletion rate per base \| 0.01%
	Deletion average length \| 1.47
	Insertion rate per base \| 0.01%
	Insertion average length \| 1.35
	MULTI-MAPPING READS:
	Number of reads mapped to multiple loci \| 1192938
	% of reads mapped to multiple loci \| 1.36%
	Number of reads mapped to too many loci \| 0
	% of reads mapped to too many loci \| 0.00%
	UNMAPPED READS:
	% of reads unmapped: too many mismatches \| 0.00%
	% of reads unmapped: too short \| 6.29%
	% of reads unmapped: other \| 0.02%
	CHIMERIC READS:
	Number of chimeric reads \| 0
	% of chimeric reads \| 0.00%

view raw featureCounts.log hosted with ❤ by GitHub

featureCounts subreads rna-seq • 5.0k views

ADD COMMENT • link 8.0 years ago by Michael 55k

Entering edit mode

8.0 years ago

Devon Ryan 105k

Do you really want -s 1? I would think -s 2 is more likely for recent data.

ADD COMMENT • link 8.0 years ago by Devon Ryan 105k

Entering edit mode

Ooh, they just told me it was 'strand-specific paired end', running featureCounts again....

Successfully assigned fragments : 73783106 (88.0%)

Thank you very much!

ADD REPLY • link 8.0 years ago by Michael 55k

Entering edit mode

Glad it was that easy :) Everything these days uses the "reverse" setting.

ADD REPLY • link 8.0 years ago by Devon Ryan 105k

Entering edit mode

Not quite everything. Lots of bacterial strand-specific RNA-seq is done using RNA ligation, and thus is -s 1.

But you are right that most human/mouse/etc samples seem to be using dNTP strategy recently.

ADD REPLY • link 7.3 years ago by predeus ★ 2.1k

	Run in single end mode:
	========== _____ _ _ ____ _____ ______ _____
	===== / ____\| \| \| \| _ \\| __ \\| ____\| /\ \| __ \
	===== \| (___ \| \| \| \| \|_) \| \|__) \| \|__ / \ \| \| \| \|
	==== \___ \\| \| \| \| _ <\| _ /\| __\| / /\ \ \| \| \| \|
	==== ____) \| \|__\| \| \|_) \| \| \ \\| \|____ / ____ \\| \|__\| \|
	========== \|_____/ \____/\|____/\|_\| \_\______/_/ \_\_____/
	v1.5.2

	//========================== featureCounts setting ===========================\
	\|\| \|\|
	\|\| Input files : 1 BAM file \|\|
	\|\| P 10_S48_R1_001.fastq.gzAligned.sortedByCoor ... \|\|
	\|\| \|\|
	\|\| Output file : featurecounts-test.txt \|\|
	\|\| Summary : featurecounts-test.txt.summary \|\|
	\|\| Annotation : /export/jonassenfs/michaeld/licebase/genomed ... \|\|
	\|\| Assignment details : <input_file>.featureCounts \|\|
	\|\| (Note that files are saved to the output directory) \|\|
	\|\| \|\|
	\|\| Dir for temp files : ./ \|\|
	\|\| \|\|
	\|\| Threads : 16 \|\|
	\|\| Level : meta-feature level \|\|
	\|\| Paired-end : no \|\|
	\|\| Strand specific : stranded \|\|
	\|\| Multimapping reads : counted \|\|
	\|\| Multi-overlapping reads : not counted \|\|
	\|\| Min overlapping bases : 1 \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//

	//================================= Running ==================================\
	\|\| \|\|
	\|\| Load annotation file /export/jonassenfs/michaeld/licebase/genomedata/l ... \|\|
	\|\| Features : 59324 \|\|
	\|\| Meta-features : 13594 \|\|
	\|\| Chromosomes/contigs : 2193 \|\|
	\|\| \|\|
	\|\| Process BAM file 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam... \|\|
	\|\| Paired-end reads are included. \|\|
	\|\| Assign reads to features... \|\|
	\|\| Total reads : 167713978 \|\|
	\|\| Successfully assigned reads : 75979398 (45.3%) \|\|
	\|\| Running time : 14.17 minutes \|\|
	\|\| \|\|
	\|\| Read assignment finished. \|\|
	\|\| \|\|
	\|\| Summary of counting results can be found in file "featurecounts-test.txt" \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//


	===== Run in paired-end mode:

	========== _____ _ _ ____ _____ ______ _____
	===== / ____\| \| \| \| _ \\| __ \\| ____\| /\ \| __ \
	===== \| (___ \| \| \| \| \|_) \| \|__) \| \|__ / \ \| \| \| \|
	==== \___ \\| \| \| \| _ <\| _ /\| __\| / /\ \ \| \| \| \|
	==== ____) \| \|__\| \| \|_) \| \| \ \\| \|____ / ____ \\| \|__\| \|
	========== \|_____/ \____/\|____/\|_\| \_\______/_/ \_\_____/
	v1.5.2

	//========================== featureCounts setting ===========================\
	\|\| \|\|
	\|\| Input files : 1 BAM file \|\|
	\|\| P 10_S48_R1_001.fastq.gzAligned.sortedByCoor ... \|\|
	\|\| \|\|
	\|\| Output file : featurecounts-test-p.txt \|\|
	\|\| Summary : featurecounts-test-p.txt.summary \|\|
	\|\| Annotation : /export/jonassenfs/michaeld/licebase/genomed ... \|\|
	\|\| Assignment details : <input_file>.featureCounts \|\|
	\|\| (Note that files are saved to the output directory) \|\|
	\|\| \|\|
	\|\| Dir for temp files : ./ \|\|
	\|\| \|\|
	\|\| Threads : 16 \|\|
	\|\| Level : meta-feature level \|\|
	\|\| Paired-end : yes \|\|
	\|\| Strand specific : stranded \|\|
	\|\| Multimapping reads : counted \|\|
	\|\| Multi-overlapping reads : not counted \|\|
	\|\| Min overlapping bases : 1 \|\|
	\|\| \|\|
	\|\| Chimeric reads : counted \|\|
	\|\| Both ends mapped : required \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//

	//================================= Running ==================================\
	\|\| \|\|
	\|\| Load annotation file /export/jonassenfs/michaeld/licebase/genomedata/l ... \|\|
	\|\| Features : 59324 \|\|
	\|\| Meta-features : 13594 \|\|
	\|\| Chromosomes/contigs : 2193 \|\|
	\|\| \|\|
	\|\| Process BAM file 10_S48_R1_001.fastq.gzAligned.sortedByCoord.out.bam... \|\|
	\|\| Paired-end reads are included. \|\|
	\|\| Assign fragments (read pairs) to features... \|\|
	\|\| \|\|
	\|\| WARNING: reads from the same pair were found not adjacent to each \|\|
	\|\| other in the input (due to read sorting by location or \|\|
	\|\| reporting of multi-mapping read pairs). \|\|
	\|\| \|\|
	\|\| Read re-ordering is performed. \|\|
	\|\| \|\|
	\|\| Total fragments : 83857077 \|\|
	\|\| Successfully assigned fragments : 4319884 (5.2%) \|\|
	\|\| Running time : 17.50 minutes \|\|
	\|\| \|\|
	\|\| Read assignment finished. \|\|
	\|\| \|\|
	\|\| Summary of counting results can be found in file "featurecounts-test-p.tx \|\|
	\|\| t" \|\|
	\|\| \|\|
	\===================== http://subread.sourceforge.net/ ======================//


	STAR Log final out:

	Started job on \| May 31 15:17:16
	Started mapping on \| May 31 15:17:16
	Finished on \| May 31 15:38:34
	Mapping speed, Million of reads per hour \| 246.94

	Number of input reads \| 87662433
	Average input read length \| 148
	UNIQUE READS:
	Uniquely mapped reads number \| 80934353
	Uniquely mapped reads % \| 92.33%
	Average mapped length \| 148.30
	Number of splices: Total \| 18880378
	Number of splices: Annotated (sjdb) \| 14710103
	Number of splices: GT/AG \| 18348626
	Number of splices: GC/AG \| 39184
	Number of splices: AT/AC \| 3555
	Number of splices: Non-canonical \| 489013
	Mismatch rate per base, % \| 0.38%
	Deletion rate per base \| 0.01%
	Deletion average length \| 1.47
	Insertion rate per base \| 0.01%
	Insertion average length \| 1.35
	MULTI-MAPPING READS:
	Number of reads mapped to multiple loci \| 1192938
	% of reads mapped to multiple loci \| 1.36%
	Number of reads mapped to too many loci \| 0
	% of reads mapped to too many loci \| 0.00%
	UNMAPPED READS:
	% of reads unmapped: too many mismatches \| 0.00%
	% of reads unmapped: too short \| 6.29%
	% of reads unmapped: other \| 0.02%
	CHIMERIC READS:
	Number of chimeric reads \| 0
	% of chimeric reads \| 0.00%