Entering edit mode
16 months ago
elielsonveloso
•
0
Hello, community!
I am wondering if it is possible to obtain rs IDs of variants when the information I have are like this:
chr1:123456 123456 123456 A G
The column names are "variant_id", "start_hg19", "end_hg19" "ref", "alt", respectively. I have 706 variants in my dataframe. I have unsuccessfully tried to convert information from hg19 to hg38 before doing the rs ID search , but I think that searching for rs ID first would be easier to secondly run hg19 -> hg38 conversion and annotation of variants doing BioMart package.
I would appreciate any help or insights that could give me a way to solve this.
Thanks in advance!
You could directly search for these in GRCh37 VCF from NCBI: https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_dbSNP_all.vcf.gz
You don't seem to have a real example up there.
Chr1
notation will need to be changed to NCBI (NC_000001.10
). @Devon Ryan has the mappings here: https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txtThanks for the idea!! Would you have an example of the code I would have to use to run this ? I am using RStudio!
You can use
bcftools query
like this:But to search with your ID just use the answer here: https://bioinformatics.stackexchange.com/questions/18431/filtering-a-vcf-with-a-text-file-of-snp-rsids