Hello all,
Data: Paired end, RNASeq data.
I had an issue with the featureCounts output Assigned reads are greater than the HISAT mapped on aligned concordantly exactly 1 time
From HISAT:
aligned concordantly exactly 1 time is 48335140
From featureCounts summary:
Assigned: 64074047
Assigned value is 1.32 times greater than HISAT mapping results. It's weird that Assigned value is higher than mapped reads. how do I solve this issue?
I used bam file from hisat2 to extract only uniquely concordant mapped reads code to address the issue but still I get 1.3x greater than hisat. code below:
# Run HISAT2
.......
# extract header from bam and save to sam file
…..
#extract uniquely concordant reads
samtools view sample-sorted.bam | \
awk 'BEGIN{FS="\t";OFS="\t"}{if ($NF=="NH:i:1" && $(NF-2)=="YT:Z:CP"){print $0}}' > \
sample-for-subread.sam
# merge header and sam file above
….
# sam to bam
….
#sort bam for featureCounts
samtools sort -o sample-for-subread-with-header-sorted.bam \
sample-for-subread-with-header.bam
#run featureCounts
featureCounts -p -B -d 20 -D 500000 -C --ignoreDup --primary -Q 1 -t CDS -a reference/canFam3.gtf -o sample_one_overlap.txt sample-for-subread-with-header-sorted.bam
Output:
featureCounts output:
Assigned 64074047
Unassigned_Unmapped 0
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 0
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 32309500
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 286733
HISAT2 Output:
61464086 reads; of these:
61464086 (100.00%) were paired; of these:
11848735 (19.28%) aligned concordantly 0 times
**48335140 (78.64%) aligned concordantly exactly 1 time**
1280211 (2.08%) aligned concordantly >1 times
Please help me to solve this.
It's featureCounts, not featureCount. I've edited your post and fixed all mentions of featureCounts.
What is the output of
samtools flagstat
?