I have downloaded SNP data from the 1000 genomes project through Biomart and UCSC genome browser. These SNP data are annotated as being synonymous or non-synonymous (missense). In all textbooks it is said the the number of synonymous mutations should be much higher than non-synonymous mutations. Then why is it that I consistently observe higher number of non-synonymous SNPs for the human genome? Do you think there might be a mistake in annotating these SNPs or there is something else that I am missing?
Can you give us more details about what exactly you are downloading, so that we might check it ourselves. One possible explanation, I may be wrong so someone correct me if I am, but rare variants are enriched for being damaging/non-synonomous. The idea being silent mutations become more frequent in the population wher.e as deleterious ones will remain infrequent. Since the 1000genomes project are looking at rare variants this could be one reason. I am not 100% confident on this, so perhaps someone with more knowledge can say more
Thank you. One of the data I have downloaded: UCSC table browser, human genome assembly hg19, All SNPs(135), filter: validated by 1000genomes and function is missense versus the same table except the function is set to be synonymous.
try removing the function filter and check again. I'm downloading it now. I'll post my answer as soon as I get the data.
Here are the results when removing the function filter and counting synonymous and non-synonymous SNPs: non-synonymous SNPs = 161737 and synonymous SNPs = 124014
I got same for coding-synon and for missense as you. From the different combination of entries in the function field, it seems that a lot of SNPs contain even contradictory annotations for instance, 6000+ SNPs are listed as being coding synonymous AND intronic, which is obviously not possible. That said, I'm not sure how much faith I would place in these annotations. You could try stripping off the functional annotations, and making a new file of the positions and using the annovar program with hg19 to give you up to date fresh annotations. But just for my own peace of mind, I can't guarantee that this will yield a different result. But would be interesting to see it compared.
It certainly IS possible for a SNP to have two different annotations, due to alternative splicing: in these cases, the SNP is synonymous in some transcripts of a gene, and not incorporated in others (i.e. is intronic).
If a snp is intronic, then it is in an intron, and therefore will never be incorporated into a transcript, regardless of alternative splicing. Are exons that are not incorporated into the transcript also called introns? Sorry, a little confused now.
In alternative splicing there are multiple transcripts, so the multiple annotations refer to each possible transcript. For example, look at the top graphic on the Alternative Splicing Wikipedia page. If you had a SNP in the yellow alternative exon, then it would have two annotations: on the left transcript -- exon, and on the right transcript -- intron.
I thought that, to continue with the wiki example, introns are the black lines, and so any variant that is contained within the intron could never be included in any transcript, and that in the context of alternative splicing there is "exon shuffling" but wether or not they are included in the transcript, they are still exons, and that the two are mutually exclusive. But this is not the case it seems? Sorry to go off topic slightly. I just wanted clarification so I can update my understanding on how these annotations work.
Exon/intron only have meaning relative to the specific transcript you're looking at. So in the left transcript the yellow box is an exon and the green box is an intron (spliced out). In the right transcript the green box is an exon and the yellow box is an intron. Even though it is convenient to think of a condensed gene, for the purpose of considering the impact of a change you need to consider each transcript independently. This is why a variation can have multiple annotations.
Right. In which case I retract what I previously said and apologise for the misinformation. Thanks Brad and Dgmacarthur. Now to go back to school.
Thanks for introducing Annovar program. I will check annotations again and put here the results.