featureCounts paired end reads wrongly assigned on the single end mode
1
0
Entering edit mode
2.0 years ago
UserA • 0

Hello! I have RNAseq data from samples containing 2 to 3 bacterial strains each. For the analysis I performed FastQC, trimmomatic to remove adapters and then Bowtie2 to align the reads to the reference genomes. I have one bowtie2 output file per sample with the paired ends and another one with the singles. I want to make one output file per reference genome with featurecounts for the paired ends on one side and the singles on the other.

For the paired end I run this command : featureCounts -T 24 -p -M -g Parent -a $x -o ${Output}/${genome}.txt ${Samples}/${genome}_paired

In output I get this: enter image description here

I don't understand why in the running featurecounts part assign them in single end mode ?

when I look at one of the paired end input files I get this:

97320876 + 0 in total (QC-passed reads + QC-failed reads)
97320876 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
591739 + 0 mapped (0.61% : N/A)
591739 + 0 primary mapped (0.61% : N/A)
97320876 + 0 paired in sequencing
48660438 + 0 read1
48660438 + 0 read2
499606 + 0 properly paired (0.51% : N/A)
518434 + 0 with itself and mate mapped
73305 + 0 singletons (0.08% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
featureCounts • 2.0k views
ADD COMMENT
1
Entering edit mode

You need to include

 --countReadPairs    Count read pairs (fragments) instead of reads. This option
                      is only applicable for paired-end reads.

along with

 -p                  Specify that input data contain paired-end reads. To
                      perform fragment counting (ie. counting read pairs), the
                      '--countReadPairs' parameter should also be specified in
                      addition to this parameter.

to count the read pairs.

ADD REPLY
0
Entering edit mode

Thanks a lot for the answer, from now on featurecounts takes well into account that they are paired reads. On the other hand, the output results give me very high values of : Unassigned_Unmapped and Unassigned_NoFeatures. For example : enter image description here

As I work with samples that contain several bacteria this does not really surprise me for the other reference annotation genomes as I have much less RNA corresponding to these strains in my samples. However, for this strain (which takes almost 70% of my samples) I also get low results which worries me more.

for the strain that represents about 70% of my samples I have : enter image description here

and for another strain that represents like 1% I have : enter image description here

I'm doing this script :

featureCounts -T 24 -p --countReadPairs -s 2 -g Parent -a $x -o ${Output}/${genome}.txt ${Samples}/${genome}_paired

I took the reference genome and annotation file from the same webpage on NCBI website, did the alignement with Bowtie2, converted sam to bam with samtools view and sorted the samples by read name with samtools sort.

ADD REPLY
1
Entering edit mode

If you are working with a mix of genomes then it is possible that your strain is only a small fraction of the total data.

You need to address what @swbarnes2 pointed out below. featureCounts only works with data that is aligned in first place. And if you have really poor alignment % then that needs to be addressed first.

ADD REPLY
0
Entering edit mode

For the bacteria representing ~70% (i'm talking to 70% because I made qPCR on the samples) of my sample I have this :

enter image description here

The result of 591739 + 0 mapped (0.61% : N/A) is from a bacteria representing like 1% of the sample

I thought that since this bacteria mapped well (95 to 98% depending on the sample) I would have better results for the featurecounts output with it, more than 1 to 8% succesfully assigned alignements.

ADD REPLY
1
Entering edit mode

If you are trying to make a bam file of the reads aligned to a single reference, don't do it like that. Subset the bam you have to just the reads aligning where you want them. Though since there might be some similar regions between all your bacteria, this pipeline might not be appropriate for what you ae doing. Neither Bowtie nor FeatureCounts is very smart about reads which might align to multiple places.

ADD REPLY
0
Entering edit mode

Have you looked at the alignments in IGV or a similar browser? If your reads are multi-mapping then those are going to be not counted by default. You may also have a lot of rRNA in your data so that would be something else to check on.

ADD REPLY
0
Entering edit mode

I haven't looked at the alignment on IGV, I'm very new to it, but I'll try! In Bowtie2 output for this strain which is the most present in my samples I have from 1 to 7% of reads that align more than once.

enter image description here

At the sequencing platform there was depletion of rRNA so I assumed it was ok on that side but I will check. Do you check this with IGV too?

ADD REPLY
1
Entering edit mode

Most the data appears to align one time so that looks reasonable. Do you get more assignments if you try -s 0 (non-stranded) option? Are you certain your libraries are "reverse stranded"?

ADD REPLY
0
Entering edit mode

I get a little more successfully assigned alignements with the -s 0 option. I'm not sure, I sent my samples to Illumina NovaSeq 6000 sequencing. I will ask the sequencing platform to be sure. I don't understand why I have a good percentage of alignement for this strain (between 80 and 95%) but a low percentage of successfully assigned alignements (between 2 and 9%).

ADD REPLY
0
Entering edit mode
2.0 years ago

591739 + 0 mapped (0.61% : N/A)

Unless you really expect this for some reason, forget about FeatureCounts until you fix this.

ADD COMMENT

Login before adding your answer.

Traffic: 2688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6