Question

Discrepancy between Nextera PE illumina produced sequence vs Sanger sequence of same sample??!!

0

Entering edit mode

6.2 years ago

zayed.saud • 0

Hi all,

I have recently sequenced the exact same sample (within the space of 24 hours) using an ABI 3130 sanger sequencer to sequence a given region as well as by NGS analyses (Nextera PE on a nextseq 500). The sample was a lyophilized live attenuated vaccine (Highly passaged DNA double stranded virus)

To my surprise (and disappointment), the sequences don't match. I've attached a screenshot of the Sanger sequence aligned to the reference genome (image 1). There don't appear to be any peaks under peaks that suggest a heterogenous sample. I've also attached the mapped reads of the Illumina data, and there are no reads that contain the two variations seen in the Sanger data.

Anyone have any idea about what could be going on?!

Thanks in advance

Sanger Sequence showing two variations

NGS of first position with variation in Sanger read

NGS of second position with variation in Sanger read

next-gen sequencing assembly genome gene • 1.6k views

ADD COMMENT • link updated 6.2 years ago by swbarnes2 14k • written 6.2 years ago by zayed.saud • 0

0

Entering edit mode

Show the bit to the left on the top one. Given the other samples in there, it looks like the polyT quality is decreasing toward the end in the sanger reads and you're just missing a T call.

Or do you mean the G->A and T->C SNPs?

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

Apologies, I mean the the G>A and T>C SNPs. The first columns of both the mapped reads are the positions corresponding to the SNPs in the NGS data (I know it's quite hard to see from the images). But there was no evidence of an Adenine in any of the mapped reads in the first column of image 2 (corresponding to the G>A SNP) or any evidence of a Cytosine in any of the mapped reads in the first column of image 3 (corresponding to the T>C SNP).

ADD REPLY • link 6.2 years ago by zayed.saud • 0

0

Entering edit mode

In addition to the reply from swbarnes2, please comment on the mapping quality of the Illumina reads. Also, do you have sanger reads in the other orientation or from a separate primer site that cover these positions?

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

Can you add a bit of information about how you aligned the reads? The Sanger reads were aligned to the genome or to a locus?

I'm not so familiar with viral sequencing though.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

The sanger reads have been aligned to the exact same genome (using the function in ugene) used to map the NGS reads (using bowtie2). I also conducted a de novo assembly, and used Bandage to blast search the contigs, but I couldn't find the other sequence anywhere in the NGS reads.

ADD REPLY • link 6.2 years ago by zayed.saud • 0

score 2 · Answer 1 · 2018-09-11

2

Entering edit mode

6.2 years ago

swbarnes2 14k

I can't make heads or tails as to how your sanger data is supposed to correspond to your Illumina data. You need to zoom in on the region of interest, and show more flanking information.

Aside from the obvious possibility of sample mix-up, are there more than just those two SNPs? Does the Illumina data show any SNPs under the Sanger primer sequences?

One possible explanation is you have a mix of two sequences, but the Sanger primers were only sequencing off of one sequence, and the aligner is only aligning reads of the other to your reference.

The other obvious quick thing to do is to use grep to search your fastq, in case reads with the alternate sequence exist, but are not aligning to the reference for some reason.

ADD COMMENT • link 6.2 years ago by swbarnes2 14k

0

Entering edit mode

Apologies once again, I was a bit hasty in uploading the images and I don't think I explained myself in the best possible way. So the top image has two SNPs, an Adenine in the genome is a Guanine in the sanger sequence at position 50611. The the image below shows the mapped reads to the same genome with the very first column representing the A>G SNP position (there's a 50611 above that column). The second SNP is at position 50617 and the position corresponds to the very first column in the third image (50617 above that column). The Sanger sequences have been quality trimmed to an average LOR QV 20 >= 20. A de novo assembly using spades shows the same discrepancy, and the assembly showed only a couple of nodes in Bandage, and nowhere near the genomic region in question. I did not quality trip the NGS data for fear of losing a read that might correspond to the sanger data. I'll check the other sanger sequence asap, unfortunately, the sequence above was at the end of the gene region so it was sequenced by the forward primer but occurred at the primer binding region of the reverse sequence. I'll use a grep search on the fastq (it's not something I'm familiar with and a useful program, or tips on the best way to do this would be much appreciated).

ADD REPLY • link 6.2 years ago by zayed.saud • 0

1

Entering edit mode

If you want help, you need to provide images that help. You need to zoom in and show at least a dozen bases context on both sides of your problem. So your discrepancy is literally sitting on your reverse sanger primer? Did you use that reverse primer to do PCR prior to sanger sequencing? Does your discrepant sequence happen to match the reverse primer sequence?

ADD REPLY • link 6.2 years ago by swbarnes2 14k