Question

Paired-end reads somehow counted twice?

0

Entering edit mode

3.6 years ago

Simon Ahn ▴ 10

Hi. I'm new in Bioinformatics and try to extract read counts from fastq files.

I compared my result with answer count matrix, and read counts are doubled.

enter image description here

(Left one is from the answer read count matrix, and right one is my result.)

I used these commands on ubuntu to get my result:

Could you please tell me what went wrong?

hisat2 -p 50 \
-x [ENSEMBL refrence file] \
-1 [fastq file_1] \
-2 [fastq file_2] \
-S [output file name].sam

samtools sort -@ 8 -o [output file name].bam [input file name].sam

featureCounts -p -T 10 -a [GTF file] \
-o [output file name] \
[input file name].bam

I think I didn't apply pair-end option in some commands but I couldn't figure out which one.

RNAseq raw-count fastq • 1.6k views

ADD COMMENT • link updated 3.6 years ago by GenoMax 151k • written 3.6 years ago by Simon Ahn ▴ 10

score 2 · Answer 1 · 2021-10-22

2

Entering edit mode

3.6 years ago

GenoMax 151k

Latest version of the featureCounts has an explicit option to count reads as pairs (--countReadPairs) for use with -p. You would also want to provide correct strandedness option in your command.

ADD COMMENT • link 3.6 years ago by GenoMax 151k

1

Entering edit mode

basically, it sounds like that they have tacitly changed how the tool operates and with most training materials become outdated, leading to bugs and inconsistencies down the line ...

http://subread.sourceforge.net/

I don't even understand this:

Release 2.0.2, 29 March 2021 New parameter '--countReadPairs' is added to featureCounts to explicitly specify that read pairs will be counted, and the '-p' option in featureCounts now only specifies if the input reads are paired end (it also implied that counting of read pairs would be performed in previous versions).

I kind of sound like in the past -p would count as pairs, now one needs to pass both -p --countReadPairs together.

But then what effect does -p alone have?

ADD REPLY • link 3.6 years ago by Istvan Albert 102k

0

Entering edit mode

Problem solved thanks to you guys!

Specify that input data contain paired-end reads. featureCounts will terminate if the type of input reads (singleend or paired-end) is different from the specified type. To count fragments (instead of reads) for paired-end reads, the --countReadPairs parameter should also be specified.

According to the featurecounts manual, I should've put --countReadPairs to count a fragment (forward + backward for paired-end). That explains why my result was doubled. IMHO, putting only -p command makes run stop when I put wrong data type. Thanks a lot!

ADD REPLY • link 3.6 years ago by Simon Ahn ▴ 10