Entering edit mode
2.9 years ago
Mike
▴
20
Hi, I downloaded a bunch of rna reads (mostly single-ends) and tried to align them to a reference genome using STAR
aligner (without trimming!!).
Initially I got this error:
EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR9434783.1
TGGGAAATGACCCTCC..............
So I checked my fastq.gz file to see whether the reads had a sequence length that does not match the quality value string:
using zgrep -B4 -A8 "@SRR9434783.1" SRR9434783_1.fastq.gz
I got tons of results, the first one:
The lengths are the same (701)
I downloaded the SRX's from NCBI. Is there a solution?
Thanks a lot.
It seems your fastq data is very strange. The quality characters are not encodes following the widely-used Phred33 rule. So possibly STAR cannot recognize your quality line correctly. And, as far as I know, a read with 700bp is too long for next generation sequencing platform, like illumina. I wonder that your data is not a typical NGS data so that STAR cannot deal with it.
You were right, STAR's author replied that it does not work well with long reads, he suggests using minimap2 or STARlong