Has anyone with experience using both HISAT2 and STAR ever observed a large difference (0 vs 400+) in the counts for a gene depending on which aligner they used?
I processed some RNA-seq data first using STAR and used quantmode to get the counts. Afterwards I checked a few genes that are known to be regulated by the target protein of interest that was inhibited with a drug. Most of the genes looked fine (i.e. down in treated and up in vehicle). But, one gene called IFI27 had a 0 count across all samples, I confirmed this result using several other counting programs (htseq and feature counts). I then aligned using hisat2 and counted with both htseq and featurecounts, which produced the expected result for IFI27 with more than 400 counts for the vehicles and 17 counts for the treated samples. Other genes had small differences but nothing close to the difference for IFI27.
So to me it makes sense to use hisat2 for my mapping. However, I am bothered by the fact that STAR is highly regarded but somehow could not handle this specific gene. I was curious to see if anyone else has encountered this or has any thoughts on what might be happening.
Update: I recently mapped and quantified with Salmon and again this produced 400+ counts for IFI27
STARindex Command
module load star/2.7.3a
STAR --runThreadN 23 --runMode genomeGenerate --genomeDir /home --genomeFastaFiles /home/RNA/GRCh38.p13_genomic.fna --sjdbGTFfile /home/RNA/GRCh38.p13_genomic.gtf --sjdbOverhang 149
STARalign Command
STAR \
--outSAMattributes All \
--outSAMtype BAM SortedByCoordinate \
--quantMode GeneCounts \
--readFilesCommand gunzip -c \
--runThreadN $Threads \
--sjdbGTFfile $GTFFILE \
--outReadsUnmapped Fastx \
--outMultimapperOrder Random \
--outWigType wiggle \
--genomeDir $GENOMEDIR \
--readFilesIn ${INPUTDIR}/${OUTPREFIX}_1.fq.gz ${INPUTDIR}/${OUTPREFIX}_2.fq.gz \
--outFileNamePrefix $OUTPREFIX
HISAT2index Command
hisat2-build -p 23 -f /home/refFiles/GRCh38.p13_genomic.fna GRCh38
HISAT2align Command
hisat2 -p 8 -q -x $GENOMEDIR -1 ${INPUTDIR}/${OUTPREFIX}_1.fq.gz -2 ${INPUTDIR}/${OUTPREFIX}_2.fq.gz | \
Did you check the alignment results in a genome browser like IGV? That might give you some ideas about what caused the differences.
I will try that, I think its something with STAR, because I had counts when I quantified with Salmon
You are confounding STAR mapping with STAR quantification (and mapping and quantification in general). STAR, in particular, only counts reads to genes if the reads can be unambiguously assigned to it. It may well be the case STAR mapped many reads to the gene in question, but counted 0 reads, because the reads could be multi-mappers (e.g., there is a paralogue of the gene), or reads couldn't be unambiguously assigned to the gene (e.g. the gene overlaps with another gene).
So, as igor suggested, to check if the mappings differ between STAR and HISAT2, load both bams in IGV and browse to the region in question.
Also, HISAT2 only performs mapping, so your question is incomplete, as you didn't say how you quantified read counts in the case of HISAT2 mapping.
I double checked the counting of the bam files generated by STAR using STARquant-mode, ht-seq and featurecounts all say 0 for IFI27. I did the same for HISAT2, excluding STARquant-mode, They all gave counts of 400+. I also mapped and quantified using SALMON and TXImeta these also showed positive counts of 400+.
This suggests STAR is either failing to align the relevant reads to IFI27 or failing to count them as you stated, however that fails to explain why every counting program gives a count of 0 for the bam files from STAR, but not HISAT2, and why SALMON also shows positive counts.
I hope to check the alignment sometime this week, though I am new to IVG and have run into issues with the index not being recognized. So I will need to troubleshoot that first.
Based on the information here, I don't think you can say if STAR is failing or if the other methods are counting problematic reads.
I was able to use IGV to view the alignments and both the STAR and HISAT2 bam files show a large number of mapped reads for IFI27. I reran feature counts on the bam files from both mappers to double check the counts for IFI27 and got the same results as previous. So, now I am lost as to how STAR maps the reads so that none of the counting programs recognize IFI27 features, but HISAT2 maps such that they can be recognized
Is STAR known for being more stringent then other aligners?
FeatureCount Summary
You should check the individual reads. Are they high quality, multi-mappers, etc? Does it look like they are actually mapping properly?
They appear to be spanning exon junctions. Is there a method for assessing the individual reads, besides visually inspecting them?