Question

Next-Seq sequences map poorly to ref genome

0

Entering edit mode

6.7 years ago

exin ▴ 60

Hi

We've just started using the Illumina Next-Seq platform and haven't been getting good results sequencing our CEL-Seq2 library (75cycle, R1=15bp, R2=77bp), as opposed to what we've been getting previously using Hi-Seq.

QC apparently "looks fine". But I'm not convinced given that the deviation bars in FastQC are huge. %Q30 is ~78%.
The problem is that %mapping to the reference genome is only ~20%, even after discarding sequences with ave Qscore<30. Tech advised that we've probably overloaded the DNA.

We paid for another run of the same library at 30% less loading (~0.9pM) and 20% phiX-spike in. The same problem persists. %Q30 is better at 88%, but again, the sequences are full of N's (20% of R1 sequences are all Ns) and map at 20%.

We've been told it could be the library. But there are definitely enough DNA at ~200-400bp size.

I'd like to get an opinion on whether this is likely a library-prep problem (old reagents?) or sequencing-problem (settings?)...? What could be the triggering issue?

Thank you!!!

sequencing rna-seq fastqc cel-seq2 • 3.2k views

ADD COMMENT • link updated 6.7 years ago by GenoMax 150k • written 6.7 years ago by exin ▴ 60

1

Entering edit mode

What did your PhiX results look like? Same story?

ADD REPLY • link 6.7 years ago by Joe 22k

0

Entering edit mode

Hmmm.... I kind of assumed PhiX reads were not included in my fastq files and were only used for calculating the error rates. I might have to double check then. Thank you.

ADD REPLY • link 6.7 years ago by exin ▴ 60

1

Entering edit mode

Not 100% sure with the NextSeq. The software on the machine may have filtered them out already - but they got sequenced, so the data should be available I would expect. You might have to recall them from the bcl files perhaps.

It would tell you if it’s your input DNA that’s the issue though, if the PhiX looks good.

ADD REPLY • link 6.7 years ago by Joe 22k

0

Entering edit mode

does the cluster density look okay?

ADD REPLY • link 6.7 years ago by NB ▴ 960

0

Entering edit mode

1st run: ~190 2nd run: ~ 95 (I'm not sure how this converts to the 170-220 k/mm2 recommended range for NextSeq, probably the same scale?)

ADD REPLY • link 6.7 years ago by exin ▴ 60

0

Entering edit mode

Have a look in RunCompletionStatus.xml file for ClusterDensity

ADD REPLY • link 6.7 years ago by NB ▴ 960

0

Entering edit mode

That's the number. 95

ADD REPLY • link 6.7 years ago by exin ▴ 60

0

Entering edit mode

if it says <ClusterDensity>95</ClusterDensity> then its quite low. Its under clustered

ADD REPLY • link 6.7 years ago by NB ▴ 960

score 1 · Answer 1 · 2018-08-24

I think the problem here may be a particular option used with bcl2fastq. If a read is shorter than 22 bp (which your R1 is) it is automatically masked with N's per default. I have a hunch that is what may be happening with this run. You will need to specify --mask-short-adapter-reads 0 to turn that masking off. Then you should be able to recover sequence for all R1 reads.

If the above setting was already in use then a second possibility is that CELseq (not sure what it is) may be leading to low nucloetide diversity in the first 15 cycles (e.g. all A's at a particular cycle). NextSeq image analysis program may be having issues with recognizing clusters apart leading to N base calls.Sequencers differ in their resilience in sequencing strange libraries. MiSeq is generally the best. Followed by HiSeq. NextSeq is likely at the bottom of that list.