I have a large set of SNPs ,do any one know a fast way to annotate every SNP with a conservation score value ?
I have a large set of SNPs ,do any one know a fast way to annotate every SNP with a conservation score value ?
Ensembl provides GERP conservation scores for a variety of taxonomic levels. They can be accessed through the Perl API and there are ftp dumps here: ftp://ftp.ensembl.org/pub/current_emf/ensembl-compara/
Thanks guys I found an automated way to do that :
ANNOVAR uses phastCons 46-way alignments to annotate variants that fall within conserved genomic regions. the --regionanno argument need to be supplied so that the program knows what to do. In addition, the --dbtype need to be specified so that the program knows which annotation database file to interrogate. Make sure that the annotation database is already downloaded (the command is " annotate_variation.pl -downdb mce46way humandb/ ").
http://www.openbioinformatics.org/annovar/annovar_region.html#conserved
UCSC provides a mammalian conservation score via Conservation Track.
In case you wanna do this by yourself, psi-blast your sequence against uniprot, grab the pssm matrix and parse for the mutant position. This way you get the score for the wildtype and mutant residue. The same could be done for the percentage matrix in case you prefer that. Of course this approach is applicable if you're talking about nsSNPs on protein level.
Chris
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Avilella: Do you have link to documentation/manuscript about GERP scores ?
http://www.ensembl.org/info/docs/compara/analyses.html#conservation
http://www.genome.org/cgi/content/abstract/15/7/901
Thanks a lot !!!