Entering edit mode
5.9 years ago
misterie
▴
110
Hi,
Do you know any idea how to do comparison of SNPs and InDels distribution between chromosomes? I know, I should take account a different chromosomes size. I have calculated number of SNPs for each chromosomes as well as number of InDels. Should I take account into size of indels? Do you have any idea?
I want to compare distribution of SNP -- randomness, some patterns associated with chromosome
This is not perhaps what you are looking for in terms of guidance for analysis, but it help you identify what to look for? Ensembl has per chromosome summary statistics, number of coding or non coding genes and short variants (defined as <50bp in length), e.g. for chromosome 1.
I guess what you are looking for is identifying which chromosome/regions on a chromosome may have a higher/lower than expected number of variants compared to the genome frequency as a whole. Have you considered looking into regions that are shown to be highly conserved evolutionarily?
Yes, to further elaborate on Erin's final point, as an example, recent in silico prediction tools have compared mutation frequencies against random mutation backgrounds and/or 'derived' alleles that have become fixed (conserved) in the human lineage when compared to our recent ape ancestor. If you want ideas, I would look at some of those recent tools, such as GWAVA, FATHMM MKL, CADD, DANN, etc. Ensembl has much untapped data that can be put to great use.
As a side note: conservation is the single best predictor of functionality of a mutation / variant.
I have got my own VCF files that contains information about InDels and VCF (also annotated) and I want to do some comparison between chromosomes. It is not a human genome.