Hello, I have a list of 8M SNPs from a meta-GWAS which does not include the chromosome ID and chromosome location in bp. I thought I could do this using the grch37.ensembl.org mart of bioMart (I have never done this before and I am still learning), but I have read in other posts that there is a 500 line limit.
My list of rsIDs is of about 8.5M and I have read in other posts that there are other ways to do this like Parse the VCF, Use the APIs or Use the VEP.
The thing is that I do not know how to do any of these and I have not managed to find any sort of tutorial or anything to learn to do that. Do any of you have any sort of indication about where can I learn to retrieve the chromosomal location and ID for my list, please?
Thanks in advance.
An alternate way would be to use galaxy https://usegalaxy.org/
Thank you so much! This sounds exactly like what I needed. I am giving it a try right now, will let you know if it worked. :)
Edit: It worked just fine, although it's only been able to retrieve 3.5 million out of my 8 million file. I assume this is normal as some of the SNPs might just not be in the database or identified yet, but I was wondering if there'd be a way to double check this.
You can use the Ensembl VEP (command line version): https://grch37.ensembl.org/info/docs/tools/vep/script/index.html
Use your rsIDs as input: https://grch37.ensembl.org/info/docs/tools/vep/vep_formats.html#id The default output includes the locations: https://grch37.ensembl.org/info/docs/tools/vep/vep_formats.html#output