Hi everyone,
I hope you're doing well. I've been encountering a puzzling issue in my RNA-seq analysis pipeline and was hoping to get some insights from this knowledgeable community.
I'm currently working on an RNA-seq project, where I've aligned my trimmed reads to the mouse reference genome (GRCm39) using HISAT2 and counted the reads using FeatureCounts. However, I'm getting significantly different alignment rates between the two tools. With HISAT2, I consistently obtain alignment rates around 60-75%, whereas with FeatureCounts, the alignment rates drop to approximately 30%.
I've been trying to figure out the reasons behind this discrepancy and have considered several factors based on discussions and suggestions from various sources:
- Quality of Reads:
I've ensured that my input reads are of good quality, with proper trimming and filtering steps applied to remove low-quality bases and adapter contamination.
- Reference Genome and Annotation Files:
I've used the GRCm39 mouse reference genome for both alignment and annotation. The GTF file used for annotation is specific to GRCm39.
- HISAT2 Alignment:
I've aligned the trimmed reads to the reference genome using HISAT2, ensuring correct indexing, and followed the recommended parameters for alignment.
- FeatureCounts:
For read counting, I've used FeatureCounts with the same GTF file used for HISAT2 alignment. The BAM files generated by HISAT2 were sorted, indexed, and used as input for FeatureCounts.
Does FeatureCounts do alignments?
Or is it telling you that only 30% of your reads intersect with exons?