Rnaseq: Read Length Different From Expected
2
2
Entering edit mode
11.4 years ago

Hello all,

I have received paired-end reads for 40 samples. The reads are supposed to be 100bp per end. Instead, 20 of my samples are 101bp per end, while the other 20 are 100bp as expected.

Because of this, we assume that these 20 samples were all in the same lane; and that by accident there was an extra iteration in the illumina sequencing. However, we also see a strong negative correlation with read length and quality; the samples in which we had 101bp per end lose about 30% of the reads in the trimmomatic quality step.

My question is: Has this happened to anyone else? Does it occur often, and if so, does it often affect quality? We are really quite puzzled by this.

Any ideas / clues appreciated.

Thank you! -Thies Gehrmann

rnaseq quality • 3.5k views
ADD COMMENT
0
Entering edit mode

Did you check if the first or last letter is the same for every read?

ADD REPLY
1
Entering edit mode
11.4 years ago
venks ▴ 740

I had Illumina 1.9 fastq data which has read lengths of 101bp. If this is right and that is something that @theisgehrmann expects. Then I guess the read quality at last 30 BP might be because of the Universal adapters that are added if the fragment size is small.

Good luck

ADD COMMENT
0
Entering edit mode
11.4 years ago

I believe that the sequencing instrument can be operated only for sequencing lengths that are increments of 50bp. So no 101bp sequencing seems possible.

One can however run the bcl->fastq conversion with arbitrary parameters.

Is it possible that someone overrode the information in the setup file and ran these with incorrect barcode lengths? I could see how someone using an automated script generating other conversion scripts entered a one base shorter barcode that in turn affects the the other lengths.

ADD COMMENT
0
Entering edit mode

I guess I'll have to call them up! Thank you!

ADD REPLY
0
Entering edit mode

50bp increments is true for HiSeq, but not true for e.g. GAII, which does 36, 75 and 100 (http://www.biotech.wisc.edu/facilities/dnaseq/sequencing/Illumina). However, I agree with the basic point that 101 seems unlikely :)

ADD REPLY

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6