How to convert SNP genome positions to variant identifiers and genome annotations
1
0
Entering edit mode
8.9 years ago
Tim • 0

Hi Biostars,

I would like to learn how to convert the genome positions (e.g., Chr6: 467841) into other useful identifiers and annotations. For example, I use the vcftools to get only SNPs in a ".012" format, which also outputs the site locations (i.e., genome positions) in a ".012.pos" file. I use the following command:

vcftools --vcf xxx.vcf --out SNP --remove-indels --012

Basically, it creates "SNP.012" that only contain 0,1,2 values and "SNP.012.pos" that contains the site location like:

Chr1    2673
Chr1    2695
Chr1    2696

I would like to match these site locations (i.e., genome positions) to variant identifiers to genome annotations. I have some success in loading a gff3 file (e.g., NCBI genome annotation downloaded) and doing left/right joins in R. But it seems somewhat ad hoc. I tried to use Bioconductor packages (GenomicRanges, GenomicFeatures, biomaRt) but I couldn't find efficient/fast/best practices. FYI, I prefer working in R/Bioconductor.

Thanks!

snp vcftools genome • 2.5k views
ADD COMMENT
1
Entering edit mode
8.9 years ago

Why not use one of the available variant annotation tools, like Annovar or SnpEff, with the original VCF? Those provide information relative to known features, and have the additional advantage of mutation classification (synonymous, missense, nonsense, splicing) in coding sequences (impossible from your SNP.pos, which lacks the nucleotide change). You can always filter the output for only SNPs.

ADD COMMENT
0
Entering edit mode

I had to analyze the genotype matrix ("012" format) in R and find out "important" SNPs. I simply feel like there must be a straightforward way of going from the site location (genome position) to variant identifiers, gene id, and/or known annotations. In other words, if there is a list of site locations (like Chr1 2673), what's the best way of getting annotations from RefSeq, Ensembl, and such (downloaded in gff3 or gtf formats, or accessing via any API)? Any help would be appreciated!

Thanks for great suggestions. I look more into Annovar and SnpEff.

ADD REPLY

Login before adding your answer.

Traffic: 1271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6