Is there any database out there that have pre-computed LD data for hg19 using 1000Genomes. I need to map a given SNP to the genes that are in LD with it. I am ideally looking for something like scan-db, they provide nice data (see below) but the data is a bit outdated.
Thanks,
Shameer
Thanks Emily, can you please point me to an example that show the access to pre-computed data via Ensembl Perl API?
There's some sample code that you can use here.
Great Emily,
I was able to run the example and get the the r2 information for the SNPs. See below:
I have few questions:
Thoughts?
1. We calculate it on the fly.
2. You can go from a variant to other variants in LD with it. You would need to:
3. You can map to genes by getting the TranscriptVariation from the VariationFeature. From there, you can get to an Ensembl Transcript. This will then get you into our Core API, where you can get the Gene, and get its external name.
I can help you out with any of this.
Thanks Emily,
I am looking for a way to integrate all of that modules you have mentioned to create a file that takes a list of SNP
Example here and generate an output in the following format
You'll need to write a Perl script that reads through the list, one-by-one, then feeds the IDs in to all those steps I mentioned.
Sure, that's what I was trying to do since last evening. With limited documentation and some of the methods are defined as under development, am a bit disappointed with the progress. Will you be able to provide an example that link those modules? Appreciate your help!
I've written the following script for you that will go through the lines one by one. I tested it on just a single variant and it gave thousands of lines of data in response so you might want to add in some filtering. For example, at the moment it's giving all the 1000 genomes populations - you might want to filter down to your favourites. It's also giving all linked variants with r2 values - you could choose an r2 cut-off instead.
Hi Emily, Thanks for the example code. I have tried running it and am having difficulty when fetching by name. I don't know perl, so working with the API has been painful.
I believe you previously answered my question on a related blog post. I have a few remaining questions regarding your script?
I am looking forward to any assistance you may be able to provide. Just FYI, I am using an open science platform for my project that pays people who provide assistance. If you help and leave a comment on the relevant discussion you can get rewarded for your efforts.
I just came across the Ensembl REST API. We would much prefer to use this than the perl API. Does it support retrieving SNPs in LD?
There is no LD endpoint on the REST API.
Specify population names by editing the following line to give specific names. At the moment it's just getting population names that include the phrase 1000GENOMES, but you can put in the full population names
Include a cutoff by editing the following line to include a
>
or<
Great, thanks a lot Emily!
Tried the script, am getting an error. Do I have to use any other modules or pass additional parameters?
Ran diagnostics and getting the following:
Thanks again!
Looks like it's not getting a value for
$v
, so isn't able to callget_all_VariationFeatures
on it. The script is assuming that every line of your input file is justrsID\n
. If there's anything else on the line then you need to either edit the script to only read the ID, or edit your input file.Thanks for your help with code-review Emily, I was able to fix it. Will post my working version of the code here.
My current concern is that the program is incredibly slow (100 SNPs in 48 hours; only CEU, r2>=0.5). Is that because of accessing the data directly from Ensembl? Other thoughts to improve the speed?
I'm not surprised it's slow. It's calculating all of these LDs on the fly and remember that when you're filtering by r2 you're still calculating r2 for all those that don't pass your filter - you're just not printing them.
True that!
Now the run is in progress but getting the following error for random SNPs: Is this a known issue ? Any suggestions to solve this?
Looks like you're losing connection to the database. We're not aware of any reason why this would be happening - could be a connection issue at your end.
Check on it.
Alternatively, wondering I can do the same query using a local installation of ensembl-human + variation databases ? Any suggestions?
You can. You'll need to install the MySQL core and variation databases for human on your system. There's more info here.