Question

How we can analyze sanger sequence chromatogram?

3

Entering edit mode

10.1 years ago

Elnaaz ▴ 40

Dear Friends,

do you know how we can analyze sanger sequence chromatogram?

I have my sequenced with specific primer in sanger sequencing and it is to find some SNPs in individual genotypes of tetraploid plants,

I do not know what does double chromatogram means?

I do not know how I can make align this double chromatogram size with my pool in tablet,?

Thanks if you help me,

Eli,

SNP sequencing alignment sanger • 18k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

Dear Devon,,

SORRY TO MISTAKE

I do not want to use bam files with sanger, I just have my sanger sequence of individual

to make clear I write it here:

First: I have 96 pools include 8 individual genotypes which sequenced by illumina (they are tetraploid)

Second: I could find some SNPs in these 96 pools which are not so much, just in 12 pools

and then: I made sanger sequence of these 12 pools of SNPs to know which individual in these pools made SNP

FINALLY, I have many individuals from my pools of SNPS with chromatograms of sanger sequence,,,,now I should make confirmation of these sanger with my illumina.

I AM TOTALY CONFUSED. I am sure that I have to make align these individual with my reference gene, but how can I make these alignment with bam files of illumina for my pools :(

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

1

Entering edit mode

No worries.

Firstly, you don't need to align the Sanger sequencing to the whole human genome or view it in the same program along with the BAM files. From your pooled Illumina sequencing, you should have a VCF (or some other format) file with the SNP locations. Just extract the sequence for the region that should be covered by the Sanger sequencing and align (in Sequencher, or whatever you prefer) to that. You know it's coordinates, so you'll know that if that region covers chr1 1000-1500 and the SNP is at position 1200 that you should look at position 200 in the Sanger alignment. There's no reason to then go back to the Illumina reads, you're already done with them. You're confirming the Illumina findings with Sanger sequencing, not the other way around (if someone tells you otherwise then you should ignore them, they don't have a clue what they're doing).

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Dear Devon,

I do not know how I can extract the sequence in illumina data for region that should be covered by sanger?

because these data are not possible to copy or screen the special sequence on that it is bam file in tablet and it has just the tool (jump to base) to find the single base in specific position

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

You don't need to extract sequence from the Illumina dataset, but from the reference genome.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

So herein your example last one shows tetraploid ( with 4 alele) ? If yes, I think it means we have A and T together in 1283 position as SNPs . =instead of AAATACTT we have AAATTCTT . May be it means for just one alele we have changing nucleotide (T) but the 3 others are (A) :

But I'm not sure is it right?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

Yes, as I said in one of my comments, that's an example with 3 copies of A and one of T (it's alfalfa DNA, so it's tetraploid).

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

yes Its completely true.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

Ram · Answer 1 · 2014-11-06

3

Entering edit mode

10.1 years ago

Devon Ryan 104k

In general, you need to load the chromatograms with a program like Sequencher (there are many options out there, but that's what I used to use) and then look for SNPs in the chromatograms (in practice, they're usually marked with an N...but not always). Most things have diploid organisms in mind, where a SNP looks like this:

< image not found >

So that's heterozygous T/C SNP. Obviously homozygous variations would have a single peak that disagreed with the reference. With a tetraploid organism, things will likely not be as clear-cut:

< image not found >

That's an example from alfalfa, which is also tetraploid.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Demon for your clear description and diagram,

I do not know how I can make align this chromatogram's sequenced with bam files of my pool in tablet ?

If somebody knows to explain I would be thankful

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

1

Entering edit mode

His name is Devon, but not Demon :)

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Manvendra Singh ★ 2.2k

1

Entering edit mode

If I start going by "Demon" then perhaps my colleagues will bring me offerings of coffee and cake when they have questions. I may need to try this... :)

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

One usually doesn't use BAM files or NGS-specific tools with Sanger sequencing. You can't view a BAM file as a meaningful chromatogram in most cases (at least unless the second most likely base and the associated scores were stored as auxiliary tags (this would be atypical)).

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Can you reliably call SNPs using Sanger data? I was under the impression that it wasn't always able to detect it and was poor at capturing it even when you get a mixed trace.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by pld 5.1k

0

Entering edit mode

Yes, Sanger is the gold standard for variant calling. It's just slow and expensive, which is why we all prefer NGS.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Do you have any publications on this?

Everything I found were basically NGS methods that start out by saying how bad Sanger is at doing this, especially with data from complex mixtures of genotypes and low frequency variants (e.g. virus, tumor).

E.g. http://www.nature.com/nm/journal/v12/n7/full/nm1437.html

This paper (from 2006) cites a 75% failure rate to detect mutant alleles in tumor biopsies.

I thought at most one could say qualitatively that at a position with a mixed trace, there may be multiple variants.

Maybe this isn't exactly the same issue as what OP has.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by pld 5.1k

1

Entering edit mode

Well, Sanger sequencing was used for a long time in forensics. That would be: detect sequence variants specific for the crime scene. Not detect any variant in the genome, just in some set of selected (and tested) loci.

The paper you cite states:

The sensitivity of conventional DNA sequencing in tumor biopsies is limited by stromal contamination and by genetic heterogeneity within the cancer

So you are comparing detection of variants present in unknown but small subpopulation of specific clonal cells (= impossible without high coverage capturing rare, diluted by normal DNA variants) to shutting a fish in a barrel, aka looking at variants in presumably identical tetraploid cells.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Darked89 4.7k

0

Entering edit mode

Thanks for reporting on the original source!

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

I'd have to look around for references, this is one of those "community knowledge" sorts of things that's rarely referenced.

BTW, that 75% figure ends up coming from a 1999 editorial in the Lancet. I don't know how much I'd read into it. I don't have access to the original article (we have terrible institutional access), so it's hard to know what's really being compared there. Certainly NGS will end up being better in highly heterogenous samples (i.e., with a lot of underlying genotypes in a given area), since you'll end up with crap quality Sanger reads there.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Joe.

I already searched to find SNPs using Sanger but its a little difficult since my individuals are tetraploid and I can not recognize if I have double heterozygous graph it means that from 4 allele 2 of them are differet ?

ADD REPLY • link 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

Dear Devon,

Thanks again so much

I did not just recognize from these nice graphs you sent in tetraploid example which is heterozygous how I can recognize is it simplex ? duplex? or etc,....since you know for tetraploidies is dificult to detect Types of SNPs in 4 aleles.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

What you end up doing is comparing peak heights. In that example, it's 3 copies of A and one copy of T. This method isn't perfect, but neither is the NGS variant (though to be far, this is why one uses NGS with pooled data, it's easier to deal with their). I suspect that tetraploid is about the highest ploidy one would want to use Sanger sequencing for (I'm sure someone's tried it with even higher ploidy, so perhaps it still works OK and I've just never seen it).

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

Do you know how I can find specific sequenced fragment of gene in IGV ?

I should coppy the desired part from fasta file and coppy in IGV ? OR NOT

ADD REPLY • link 10.0 years ago by Elnaaz ▴ 40

0

Entering edit mode

You'd need to know its location to find it in IGV.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

How I can find location ? in Reference sequence?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Elnaaz ▴ 40

0

Entering edit mode

Well you know what you sequenced...

You should really be able to figure this our yourself.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

You mean from the pcr product size? or primer forward start codes?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Elnaaz ▴ 40

0

Entering edit mode

Neither. You already have all the information and help you need to figure this out.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

Ram · Answer 2 · 2014-11-07

1

Entering edit mode

10.1 years ago

Darked89 4.7k

This is quite old manual how to detect mutations using Staden:

I assume you are looking into base substitutions, because dealing with indels/short repeat polymorphisms in a tetraploid species without subcloning your PCR products will be probably not good enough to report in a publication.

ADD COMMENT • link updated 5.1 years ago by Ram 44k • written 10.1 years ago by Darked89 4.7k

0

Entering edit mode

Dear darked,

for sure we have some kinds of indels in our Snps in addition of base substitutions and also splicing...what would you suggest. .? Instead of Sanger with specific primer in individuals you think we need sub cloning of every genotype a in our pools have indel?

i have another question about splicing which reported by our analyzer in the result.who can help me what does splicing mean to affect in protein function?

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Elnaaz ▴ 40

0

Entering edit mode

What is the size of your PCR products? I assume you must be talking about PCR-ing cDNAs, since splicing of the genomic DNA would be something, well, not typical?

In short, if your 2 reads from the opposite sides of the PCR-product overlap significantly, you would be able to detect indels on the genomic level in a diploid organism. With tetraploid, plus splicing, I would at least try to separately sequence differently sized PCR bands cut from the gel. But if you want to publish this then either you go with subcloning of the PCR bands, or switch to NGS RNAseq, preferably with 2x100 (or longer) reads.

re splicing and function:

no idea. Or rather: too complex to answer in few lines.

Just keep in mind that not everything you get from PCR-ing RNA from the cell is "real". You can have pre-mRNA with not spliced introns, splicing errors etc. Unless the thing is dominant and consistent between experiments, it may be an artifact.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Darked89 4.7k

0

Entering edit mode

No I do not have RNA Seq ;;;I just Have DNA sequenced by illumina in 96 pools which includes 8 individual per pool and my sequenced data are in tablet software ,,,now I am doing sanger with specific primers to confirm and follow up the last sequencing illumina,

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Elnaaz ▴ 40

0

Entering edit mode

Do you know how I can find specific sequenced fragment of gene in IGV?

I should copy the desired part from fasta file and copy in IGV? OR NOT

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Elnaaz ▴ 40

0

Entering edit mode

There are specific tools for viewing Sanger chromatograms and it makes no sense try to load these chromatogram files into programs which simply can not handle this (like IGV). If you can not see the chromatogram, and use simple fasta you can not make any calls as what is going on regarding SNPs/indels. Sometimes sequencing run is longer than your insert or gets killed for other reasons, and you get random noise base calls.

In case you have a lot of Sanger sequences plus FASTQ from NGS, and simply refuse to use i.e. Staden package listed above, then you have to have some genome sequence from your species, even if it is a toy "genome" in a form of contigs of interest, convert your Sanger files to FASTQ (i.e by using BioPython: http://biopython.org/wiki/SeqIO), or to FASTA, then map it to your toy genome and load into IGV. You are still out of luck if instead of clear homozygote different from reference you will get "N" base in your fasta/fastq file, but not just one but multiple reads. Which will bring you to square one: look at your Sanger chromatograms.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Darked89 4.7k