I have thousands of nucleotide changes such as (coordinates are based on HG18)
chr position ref nucleotide var nucleotide
chr11 1112345 A T
First I want to identify the most important (or all?) transcript/s that this change occurred in, find the gene name, find the amino acid change and if this is in a swiss prot protein domain find the name of that protein so I can use polyphen II pre-annotated data to predict deleteriousness.
How can I do this using publicly available datasets? I can program but I need guidance in the form of "You need to get the transcript name from this table that you can download from (i.e UCSC table browser or get from this ftp address) and find the AA position of your change in this related table etc etc. Although this is not a programming question, any simple solutions would also be appreciated.
There is a similar discussion here and here. Although, there are very good answers in those discussions, I guess I am looking for a more detailed and practical recipe than general information like "you can get this from Ensemble or Biomart"
Please feel free to answer parts of this question as we can use a pipeline of answers. Thanks
Actually snpEff supports VCF input ( see option
-vcf4
http://snpeff.sourceforge.net/manual.html :-)Sorry Pablo, that's exactly what I was trying to say in my answer. I re-phrased it to be more clear. Thanks.