Good evening, chaps. I'm analysing 12 paired-end bulk RNA-seq samples coming from the SRA and when it comes to mapping those reads to a reference genome, HISAT2 is complaining about foo_R1.fastq
and foo_R2_.fastq
having an unequal number of reads. To be precise, the whole error message is:
Error, fewer reads in file specified with -2 than in file specified with -1
terminate called after throwing an instance of 'int'
(ERR): hisat2-align died with signal 6 (ABRT)
I know for a fact that the pipeline is properly working up until this step, so modifying the previous steps is out of the question.
I heard somewhere that you can pad your files so that you can have files with equal number of reads, but I can't find the source. How can I do so so that I can successfully align those reads to the aforementioned reference genome?
Thanks in advance.
The error indicats that exactly this is not the case. Did you trim these sequences? The rookie mistake here is to use a trimmer that is not pairedend-aware so it kicks out a read from one but not from the other file. Or the files are already corrupt from NCBI but that is very unlikely, but not impossible. Please describe your pipeline with relevant code.
There's no need to be smug about it, specially since you don't know how noisy/badly sequenced the reads are (which really are).
Yes, I preprocessed the reads with a paired-end-aware trimmer, and already found the answer elsewhere.
For the record, the answer was treating the reads with fastq_pair.
This topic can now be closed.