I have data from 3 Illumina lanes (same tissue, different conditions). The number of reads obtained is comparable. Yet, when I align the reads to the reference genome, one of the lane has 1/3 of the alignments.
How can I explain this? Why so many reads don't make it to the tophat output even though they are all good quality reads (Q>30)?
Originally I have 115,022,942 reads x 2 (paired)
This is the flagstat output:
35,947,783 mapped (100%)
35,947,783 paired in sequencing (100%)
28,432,680 properly paired (79.09%)
28,791,584 with itself and mate mapped
7,156,199 singletons (19.91%)
it might help if you add the flagstat output for one of the lanes that mapped better. And you could run FastQC on your fastq input files to see if there's something wrong with the 3rd lane.
You might want to clarify the question a bit, my first reaction was that of course each of the three lanes would have 1/3 of the alignments :-)
@Ketil: I meant one lane has 33% of the its original reads aligned. The other two have close to 100% of their reads aligned.
We also have this problem. Only half of the reads can map to the genome. But I have no idea what's going wrong. I will follow this discussion.