Retrieve rs IDs from chromossome location info on hg19 build
0
0
Entering edit mode
16 months ago

Hello, community!

I am wondering if it is possible to obtain rs IDs of variants when the information I have are like this:

chr1:123456  123456    123456  A    G 

The column names are "variant_id", "start_hg19", "end_hg19" "ref", "alt", respectively. I have 706 variants in my dataframe. I have unsuccessfully tried to convert information from hg19 to hg38 before doing the rs ID search , but I think that searching for rs ID first would be easier to secondly run hg19 -> hg38 conversion and annotation of variants doing BioMart package.

I would appreciate any help or insights that could give me a way to solve this.

Thanks in advance!

Variant-Annotation rsIDs • 745 views
ADD COMMENT
0
Entering edit mode

You could directly search for these in GRCh37 VCF from NCBI: https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_dbSNP_all.vcf.gz

You don't seem to have a real example up there. Chr1 notation will need to be changed to NCBI (NC_000001.10). @Devon Ryan has the mappings here: https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txt

ADD REPLY
0
Entering edit mode

Thanks for the idea!! Would you have an example of the code I would have to use to run this ? I am using RStudio!

ADD REPLY
0
Entering edit mode

You can use bcftools query like this:

$ bcftools query -f '%ID %CHROM %POS %REF %ALT\n' GRCh37_latest_dbSNP_all.vcf.gz | head -5
rs1570391677 NC_000001.10 10001 T A,C
rs1570391692 NC_000001.10 10002 A C
rs1570391694 NC_000001.10 10003 A C
rs1639538116 NC_000001.10 10007 T C,G
rs1570391698 NC_000001.10 10008 A C,G,T

But to search with your ID just use the answer here: https://bioinformatics.stackexchange.com/questions/18431/filtering-a-vcf-with-a-text-file-of-snp-rsids

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6