Hi, I am running STAR with my human samples, using GRCh38. However, some of my samples got very low alignment rate. Most of the reads are unmapped: too short. Is there any suggestions on how to deal with this?
I was puzzling over this exact thing just yesterday, getting ~70% or so reported as "too short" by STAR (different scenario though; rhesus macaque antibody reads mapped to antibody locus scaffolds). Does anybody know what actually qualifies as "too short"? Is it a threshold on the match length? I couldn't find this defined anywhere. For what it's worth STAR's author suggests using untrimmed reads in this situation, which was the opposite of what I (and others here?) would have thought. That might just be about diagnosing problems with the trimming for paired reads though, not sure.
I'll be curious to hear what it ends up being for your case.
I'm not sure but have you check the QC of your data before alignment by FastQC?
You need to check three level of QC :
1- Quality check, removing the bases with Q less than Q20/Q30.
2- Removing the possible adapter contamination.
3- Removing first bases contamination, 5' or even in some samples 3'. ( mostly 10-15 bases would be enough).
Maybe trying other aligner also worth but it will not changing too much.
If you have successfully done the above issues, you need to consider other issues.
Assuming you have paired RNA-seq reads you can use
fastp
prior to mapping to get some idea about possible adapter sequences and insert sizes.I was puzzling over this exact thing just yesterday, getting ~70% or so reported as "too short" by STAR (different scenario though; rhesus macaque antibody reads mapped to antibody locus scaffolds). Does anybody know what actually qualifies as "too short"? Is it a threshold on the match length? I couldn't find this defined anywhere. For what it's worth STAR's author suggests using untrimmed reads in this situation, which was the opposite of what I (and others here?) would have thought. That might just be about diagnosing problems with the trimming for paired reads though, not sure.
I'll be curious to hear what it ends up being for your case.
STAR's "too short' does not literally mean too short. It means it didn't align.