I'm trying to hunt for "rare mutations" which may serve as potential causation for some rare disease based on exome data.
I just found two nonsynonymous mutations simply 3bp away from each other, which results in the alteration of two consecutive amino acids.
Actually UCSC define them as "common SNP" (UCSC defines "rare mutation" as those with MAF <= 1% while when I analyzed I loosen the cutoff to 5% ), I'm just wondering are these two mutations in a haplotype? ie. are they always transmitted at the same time, or more like independently? If transmitted always together, the MAF of "combination of two mutations" will simply be ~5%, which kicks them out from candidates because 5% sounds NOT rare enough for rare disease; if independently, the possibly smallest MAF will simply be 5%*5%=0.25%, which makes them perfect candidate for rare disease.
There are a few different things to keep in mind. Based on talks I have heard and papers I have read regarding MAF's and possibility of being a causal variant is that strict filtering on the magic 1.5% cutoff might not be a good idea. If it is something in dbSNP129 or earlier then yes, it is probably fairly common, but estimates from 1000Genomes may be overinflated. You may also be dealing with a different enough population that the MAF estimate isn't worth as much.
If your variants are 3bp's apart it should be easy enough to see if they are compound heterozygotes or not by looking at the raw data. If they are appearing on the same reads then they represent a haplotype (or a sequencing error). Otherwise they may be compound hets. Verifying is usually as simple as doing sanger sequencing of the relevant exon on your samples parent's. They should each be heterozygous for one, but not both, variants.
Check a little more in to the source of the SNPs in the database and how many samples they were sequenced in. Check EVS and their massive exome sequencing projects to see if the variants were seen there as well.
ADD COMMENT
• link
updated 5.9 years ago by
Ram
44k
•
written 12.3 years ago by
DG
7.3k
2
Entering edit mode
I second the suggestion to look at your actual reads. If the two SNPs always occur within a read, then you really don't need to do Sanger sequencing unless you have some clinical requirements.
Have you done a haplotype analysis on them -- i.e. do you know the mutation frequencies for each and if the two mutations always occur together or which percentage has one mutation versus the other? I'm a little confused on your question otherwise? Have you actually done a haplotype analysis test already?
Thanks. No I've never done haplotype analysis....That's why I'm here asking if we have any haplotype database, or any sequences across multiple individuals over this region, then I can see if the two mutations are inherited together or not.
You weren't very clear that you were asking if any haplotype databases for Human exist. Don't know what else to tell you except to Google it. Maybe someone else here can help you.
I still think doing your own haplotype analysis (using one of the methods I linked to above or some other method) will only help you. The Stacks pipeline won't take long if you're comfortable in the shell.
I second the suggestion to look at your actual reads. If the two SNPs always occur within a read, then you really don't need to do Sanger sequencing unless you have some clinical requirements.