Question

Why 50bp Illumina run produces 51bp long sequencs?

4

Entering edit mode

9.2 years ago

Tori ▴ 90

Why 50bp Illumina run produces 51bp reads? Do I have to remove one extra base before starting the analysis?

illumina next-gen • 7.1k views

ADD COMMENT • link updated 3.8 years ago by Michael ▴ 270 • written 9.2 years ago by Tori ▴ 90

Ram · Answer 1 · 2015-10-22

I highly recommend removing the extra base. In all of my tests, the extra base (101 on a 100bp run, 151 on a 150bp run, etc) had an extremely high error rate, typically 5x that of the previous base - and the quality score of the last base is not very accurate, so quality-trimming will not reliably remove it - and fastQC will not reliably indicate whether or not it needs to be trimmed, either. For that reason, I put a feature in BBDuk, forcetrimmodulo, just to remove that last base. For example:

bbduk.sh in=reads.fq out=trimmed.fq forcetrimmodulo=5

That trims all of the reads so that the length is equal to zero modulo 5, meaning that 51bp reads will be trimmed to 50bp; 76 will be trimmed to 75; 101 will be trimmed to 100; etc. If the reads are already (for example) 100bp long, then nothing will happen. In the processing order internal to bbduk, this forcetrimmodulo (or ftm for short) trimming happens before adapter trimming (if that is enabled), which is important because trimming the last extra base makes it easier to identify adapter sequence due to the lower error rate. As a result I always enable ftm when doing adapter-trimming.

I've posted various graphs showing the huge drop in quality for the extra base, but you can generate them yourself if you want, like this:

bbmap.sh in=reads.fq ref=reference.fa mhist=mhist.txt qist=qhist.txt reads=10m

Then plot the mhist and qhist to see what the true accuracy of the bases is, by location.

Edit: Actually, I should have just followed Daniel's link :)

score 7 · Answer 2 · 2015-10-22

7

Entering edit mode

9.2 years ago

User 59 13k

It depends on the base mask used. See: http://seqanswers.com/forums/showthread.php?t=61997 for a good, but brief discussion on why you might want to drop the n+1th base

ADD COMMENT • link 9.2 years ago by User 59 13k