Next-Seq sequences map poorly to ref genome
1
0
Entering edit mode
6.3 years ago
exin ▴ 60

Hi

We've just started using the Illumina Next-Seq platform and haven't been getting good results sequencing our CEL-Seq2 library (75cycle, R1=15bp, R2=77bp), as opposed to what we've been getting previously using Hi-Seq.

QC apparently "looks fine". But I'm not convinced given that the deviation bars in FastQC are huge. %Q30 is ~78%.
The problem is that %mapping to the reference genome is only ~20%, even after discarding sequences with ave Qscore<30. Tech advised that we've probably overloaded the DNA.

We paid for another run of the same library at 30% less loading (~0.9pM) and 20% phiX-spike in. The same problem persists. %Q30 is better at 88%, but again, the sequences are full of N's (20% of R1 sequences are all Ns) and map at 20%.

We've been told it could be the library. But there are definitely enough DNA at ~200-400bp size.

I'd like to get an opinion on whether this is likely a library-prep problem (old reagents?) or sequencing-problem (settings?)...? What could be the triggering issue?

Thank you!!!

Screen_Shot_2018_08_24_at_6_11_32_pm

Screen_Shot_2018_08_24_at_6_15_55_pm

sequencing rna-seq fastqc cel-seq2 • 2.9k views
ADD COMMENT
1
Entering edit mode

What did your PhiX results look like? Same story?

ADD REPLY
0
Entering edit mode

Hmmm.... I kind of assumed PhiX reads were not included in my fastq files and were only used for calculating the error rates. I might have to double check then. Thank you.

ADD REPLY
1
Entering edit mode

Not 100% sure with the NextSeq. The software on the machine may have filtered them out already - but they got sequenced, so the data should be available I would expect. You might have to recall them from the bcl files perhaps.

It would tell you if it’s your input DNA that’s the issue though, if the PhiX looks good.

ADD REPLY
0
Entering edit mode

does the cluster density look okay?

ADD REPLY
0
Entering edit mode

1st run: ~190 2nd run: ~ 95 (I'm not sure how this converts to the 170-220 k/mm2 recommended range for NextSeq, probably the same scale?)

ADD REPLY
0
Entering edit mode

Have a look in RunCompletionStatus.xml file for ClusterDensity

ADD REPLY
0
Entering edit mode

That's the number. 95

ADD REPLY
0
Entering edit mode

if it says <ClusterDensity>95</ClusterDensity> then its quite low. Its under clustered

ADD REPLY
1
Entering edit mode
6.2 years ago
GenoMax 147k

I think the problem here may be a particular option used with bcl2fastq. If a read is shorter than 22 bp (which your R1 is) it is automatically masked with N's per default. I have a hunch that is what may be happening with this run. You will need to specify --mask-short-adapter-reads 0 to turn that masking off. Then you should be able to recover sequence for all R1 reads.

If the above setting was already in use then a second possibility is that CELseq (not sure what it is) may be leading to low nucloetide diversity in the first 15 cycles (e.g. all A's at a particular cycle). NextSeq image analysis program may be having issues with recognizing clusters apart leading to N base calls.Sequencers differ in their resilience in sequencing strange libraries. MiSeq is generally the best. Followed by HiSeq. NextSeq is likely at the bottom of that list.

ADD COMMENT
0
Entering edit mode

Thank you for your insights! I didn't have to do the conversion, the sequences were received as fastq files. Maybe the facility handled that. Might check with them. But all the R1 reads are 15bp, wouldn't they be all masked then? The first 15cycles do have low nt diversity. Looks like we're better off going back to HiSeq...

ADD REPLY

Login before adding your answer.

Traffic: 2501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6